I am trying to reproduce that in Keras:
LRCN is trained to predict the
video’s activity class at each time step. To produce a single
label prediction for an entire video clip, we average the label
probabilities—the outputs of the network’s softmax layer—
across all frames and choose the most probable label.
But I am quite new to LSTMs and am not sure about which metrics and loss function to use to replicate the method applied in the text above.
So far I have an LSTM RNN, which returns sequences and its outputs I feed into a time-distributed dense layer of 3 classes.
A "frame" corresponds to a timestep of the RNN and return_sequences=True will enable me to return prediction per frame.
Could you please tell me which metrics and loss I need and if I need custom ones also?
Could you please tell me which metrics and loss I need and if I need custom ones also?
From the paper document you can see the work is old: May 2016.
Please consider some recent work with more specifics.
This paper, provides no clue on LSTM specifics except the variable number of units they used, so you may try to find models with explained metrics.
Related
I have built a stock prediction model using LSTM. however, everytime when I run the program, the value of RMSE and the prediction result keep changing ( I did not change any data in the program. It giving out different result everytime when I clicking the run buttom everytime, ) Can anyone let me know what is the reason of it. Thank you very much
I will suggest you know more about layers and some other basic things of neural networks.
How does a neural network learn?
A neural network contains three types of layers. Input, output, and hidden layers. All these layers contain neurons or you can say nodes. Every layer's neurons are connected with it's previous and next layer's neurons. Take a look at the picture below.
You can call the connections 'path'. Every path has some weights value. A neuron's input value is calculated by summing all the multiplications of outputs of previous layer's neuron and the path's weight value. Then the sum value is processed by some activation function. You can learn more about it by joining online classes or from tutorials.
But my point is, prediction completely depends on those weights. And those weights value keeps changing depending on the learning rate and some other stuff during training. What about the very beginning? at epoch no. 1? Basically model generates some random weights for all the paths. Then keeps changing those values during training to minimize the loss.
Every time you run your train, it generates random values. That's why you get different results each time. If you fix those values using tf.seed or some other method, you will get reproducible results. btw, you don't need to train every time. Save your model weights, then load it whenever you need to predict. You will get the same result every time you load the model weights and use that model to predict.
There are many sources of randomness in machine learning (described in link below).
https://machinelearningmastery.com/randomness-in-machine-learning/
In this case it may probably be "Randomness in the Algorithm" even if you make sure to apply exactly same data in same order.
I am working on a sequence prediction problem where my inputs are of size (numOfSamples, numOfTimeSteps, features) where each sample is independent, number of time steps is uniform for each sample (after pre-padding the length with 0's using keras.pad_sequences), and my number of features is 2. To summarize my question(s), I am wondering how to structure my Y-label dataset to feed the model and want to gain some insight on how to properly structure my model to output what I want.
My first feature is a categorical variable encoded to a unique int and my second is numerical. I want to be able to predict the next categorical variable as well as an associated feature2 value, and then use this to feed back into the network to predict a sequence until the EOS category is output.
This is a main source I've been referencing to try and understand how to create a generator for use with keras.fit_generator.
[1]
There is no confusion with how the mini-batch for "X" data is grabbed, but for the "Y" data, I am not sure about the proper format for what I am trying to do. Since I am trying to predict a category, I figured a one-hot vector representation of the t+1 timestep would be the proper way to encode the first feature, I guess resulting in a 4? Dimensional numpy matrix?? But I'm kinda lost with how to deal with the second numerical feature.
Now, this leads me to questions concerning architecture and how to structure a model to do what I am wanting. Does the following architecture make sense? I believe there is something missing that I am not understanding.
Proposed architecture (parameters loosely filled in, nothing set yet):
model = Sequential()
model.add(Masking(mask_value=0., input_shape=(timesteps, features)))
model.add(LSTM(hidden_size, return_sequences=True))
model.add(TimeDistributed(Dense(vocab_size)))
model.add(Activation('softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['categorical_accuracy'])
model.fit_generator(...) #ill figure this out
So, at the end, a softmax activation can predict the next categorical value for feature1. How do I also output a value for feature2 so that I can feed the new prediction for both features back as the next time-step? Do I need some sort of parallel architecture with two LSTMs that are combined somehow?
This is my first attempt at doing anything with neural networks or Keras, and I would not say I'm "great" at python, I can get by though. However, I feel I have a decent grasp at the fundamental theoretical concepts, but lack the practice.
This question is sorta open ended, with encouragement to pick apart my current strategy.
Once again, the overall goal is to predict both features (categorical, numeric) in order to predict "full sequences" from intermediate length sequences.
Ex. I train on these padded max-len sequences, but in production I want to use this to predict the remaining part of the currently unseen time-steps, which would be variable length.
Okay, so If I understand you properly (correct me if I'm wrong) you would like to predict next features based on the current ones.
When it comes to categorical variables, you are on point, your Dense layer should output N-1 vector containing probability of each class (while we are at it, if you, by any chance, use pandas.get_dummies remember to specify argument drop_first=True, similiar approach should be employed whatever you are using for one-hot encoding).
Except those N-1 output vector for each sample, it should output one more number for numerical value.
Remember to output logits (no activation, don't use softmax at the end like you currently do). Afterwards network output should be separated into N-1 part (your categorical feature) and passed to loss function able to handle logits (e.g. in Tensorflow it is tf.nn.softmax_cross_entropy_with_logits_v2 which applies numerically stable softmax for you).
Now, your N-th element of network output should be passed to different loss, probably Mean Squared Error.
Based on loss value of those two losses (you could take a mean of both to obtain one loss value), you backpropagate through the network and it might do just fine.
Unfortunately I'm not skilled enough in Keras in order to help you with the code, but I think you will figure it out yourself. While we're at it, I would like to suggest PyTorch for more custom neural networks (I think yours fits this description), though it's definitely doable in Keras as well, your choice.
Additional 'maybe helpful' thought: you may check Teacher Forcing for your kind of task. More on the topic and theory behind it can be found in the outstanding Deep Learning Book and code example (though in PyTorch once again), can be found in their docs here.
BTW interesting idea, mind if I use it in connection with my current research trajectory (with kudos going to you of course)? Comment on this answer if so we can talk it out in chat.
Basically every answer I was looking for was exampled and explained in this tutorial. Absolutely great resource for trying to understand how to model multi-output networks. This one goes through a lengthy walkthrough of a multi-output CNN architecture. It only took me about three weeks to stumble upon, however.
https://www.pyimagesearch.com/2018/06/04/keras-multiple-outputs-and-multiple-losses/
I am a freshman in Keras and deep learning, I am not quite sure the right way to add the regularization, I wrote a CNN autoencoder using the API model class, right now I add the regularizer in each of the "Conv2D" Keras function,I am not sure if this is the right place to add regularization, could anyone please give me some suggestions?
(I tried to run the training and check the reconstructed test images, it is OK, but not very good, I use MNIST to test, the line of the reconstructed MNIST number is thicker than the original one.)
In my problem, the input image is an impaired one, and the original good image is used as a training label, by comparing the output image of the CNN with the training label image, I use the "mean absolute error" to define the loss , and also use it as the metric.
I defined three functions first, one downsampling function (the one below), one upsampling function, and one function to squeeze the third dimension of the matrix to get a two-dimensional matrix as the output.
My code is too long, just to help illustrate the problem, part of my code is as follow:
After having three defined functions, I defined the model as follow (not in detail, just part of it to help explain my problem)
load all necessary parameters to the model,then define the optimizer parameters, and compile the model
Recently I'm using lstm to predict time series. I'm using keras 2.0 to construct my lstm model. It has a structure like this:
model = Sequential()
model.add(LSTM(128, input_shape=(timesteps, 1), return_sequences=False, stateful=False)
model.add(Dropout(rate=0.1))
model.add(Dense(1))
I have tried to use this network to predict several time series including sin(t) and a real traffic flow dataset. I found that the prediction for sin is fine while the prediction for real dataset is just like shifting the last input value by one step. I don't know whether it's a prediction error or the network doesn't learn the pattern of the dataset at all. Does anyone get similar results? Are there any solutions to this annoying shift? Thanks a lot.
Here are some of my predictions:
3 frequencies sin prediction result
real traffic dataset prediction result
This is simply the starting point for your network and you'll have to work through it by trying various things.
To name only a few:
Try different window lengths (timesteps fed into network)
Try adding dense layers, or multiple LSTM layers, or fewer LTSM nodes
Try different optimizers, with various learning rates
Look for additional datapoints to feed into the network
How much data do you have? You may need more to get a good prediction
Try different offsets for the Y variable, how many timesteps do you need to be able to predict out for your specific problem?
The list goes on....
This is my dataframe
https://drive.google.com/file/d/1qAnyOkp_YayqzZ4i0CwqCTDiYTIOmv6I/view?usp=sharing
I need to find the value of ra, last column of that dataset via the ANN
I have used keras library to make that, here is my code
https://gist.github.com/anonymous/9955247ad7341e5bc119556dead9fc71
But the y_pred variable has set of 0s in output. Am I doing anything wrong with activation function?
I need to predict the ra values with training dataset
P.S: I am a newbie to datascience and just I have started learning it via udemy
You can remove the second hidden layer as simple Ann is enough for this and also we don’t have to use activator at the output layer as it is regression problem.
Please see the sample code https://github.com/naveenkambham/MachineLearningModels/blob/master/NeuralNetwork.py . This is similar to your requirement.