I would like to train a network with multiple output layers.
in->hidden->out 1
->out 2
Is this possible? If so how do I setup the datasets and trainer to accomplish training.
As you are looking into splitting your output in order to have several SoftMax regions, you can use PartialSoftmaxLayer provided by PyBrain.
Note that it is limited to slices of the same length, but its code can inspire you if you require a custom output layer:
https://github.com/pybrain/pybrain/blob/master/pybrain/structure/modules/softmax.py
No. You can have multiple hidden layers, like this
in -> hidden 1 -> hidden 2 -> out
Alternatively, you can have multiple output neurons (in a single output layer).
Technically, you can set up any arrangement of neurons and layers, connect them however you like, and call them whatever you want, but the above is the general way of doing it.
It would be more work for you as the programmer, but if you want to have two different outputs, you can always concatenate your outputs into one vector and use that as the output for the network.
in --> hidden --> concatenate([out1, out2])
A possibly significant drawback of this approach is that if the two outputs are of different scales, then concatenation will distort the error metric you use to train the network.
However, if you were able to use two separate outputs, then you'd still need to solve this problem, likely by somehow weighting the two error metrics that you use.
Potential solutions to this problem could include defining a custom error metric (e.g., by using a variant of weighted squared error or weighted cross-entropy) and/or standardizing the two output datasets so that they exist in a common scale.
Related
We now have a trained network for classification task. The top of the network is like
so the layer relu_fc1 is something like extracted features, then softmax to class prediction.
Now we want to extract these features directly. In normal case, we can do it by
y = sess.graph.get_tensor_by_name('relu_fc1:0') sess.run(y,...)
That's great, but we still want to make it faster, so we use TensorRT to convert the saved model. However, after the conversion, we can't get the right tensor in the relu_fc1 because TensorRT mixed the operation up and produced something like TRTENgineOp_1.
I want to know is there a way to get the intermediate layer's output after TensorRT? I guess maybe it's easier for us can delete the last layers in the network then do the conversion, but can't find practical materials for removing the layers in tensorflow.
I want to know is there a way to get the intermediate layer's output after TensorRT? I guess maybe it's easier for us can delete the last layers in the network then do the conversion, but can't find practical materials for removing the layers in tensorflow.
For this question, when you do the tf-to-onnx conversion, you can specify which layer as the final output for the onnx model. Then, you can do the onnx-to-tensorrt conversion.
For more details, see tensorflow-onnx. The --outputs parameter is what you want.
I am using experimental data with Keras LSTM to model a complicated physical system. The problem is output value tends to change drastically between two points at certain points. All physical systems must show some continuous/smooth behavior. How can I make my output smoother, is there some kind of layer or regularization?
I tried introducing l1-l2 regularization, drop-outs... They help but I could not get good results. What I seek is some kind of layer which limits sudden changes in the values. By the way I work with a rather small amount of data; I am using 2 series to train and validify, 1 to test.
Network structure: I get similar results for 2 LSTM + 1 Dense layer or 1 LSTM + 1 Dense layer. (With/without dropout layers between LSTM and Dense, and some l2 regularization)
The time-series data represents some measurements. Measurements are taken in short intervals, resulting in repeated values time to time. I remove some of the repeated lines as well.(I concatenated them together and then removed the rows with respect to one of the inputs. I tried doing it for several inputs. But as you can understand, I did not remove all the repeated lines with this approach, can this be the source of the problem?)
I use sklearn.StandardScaler or sklearn.MinMaxScaler to normalize the input data, not much difference between the two.
You can see a sample result on test data, which has l2 regularization - please note the first two peaks in the start. There is around 20,000 points in the graph and these peaks occur over 3-5 points. In the training set there are some jumps as well but they are far more smooth and spread out. Is there someway to smoothen the output within the neural net, without adding some external filters?
I am working on a sequence prediction problem where my inputs are of size (numOfSamples, numOfTimeSteps, features) where each sample is independent, number of time steps is uniform for each sample (after pre-padding the length with 0's using keras.pad_sequences), and my number of features is 2. To summarize my question(s), I am wondering how to structure my Y-label dataset to feed the model and want to gain some insight on how to properly structure my model to output what I want.
My first feature is a categorical variable encoded to a unique int and my second is numerical. I want to be able to predict the next categorical variable as well as an associated feature2 value, and then use this to feed back into the network to predict a sequence until the EOS category is output.
This is a main source I've been referencing to try and understand how to create a generator for use with keras.fit_generator.
[1]
There is no confusion with how the mini-batch for "X" data is grabbed, but for the "Y" data, I am not sure about the proper format for what I am trying to do. Since I am trying to predict a category, I figured a one-hot vector representation of the t+1 timestep would be the proper way to encode the first feature, I guess resulting in a 4? Dimensional numpy matrix?? But I'm kinda lost with how to deal with the second numerical feature.
Now, this leads me to questions concerning architecture and how to structure a model to do what I am wanting. Does the following architecture make sense? I believe there is something missing that I am not understanding.
Proposed architecture (parameters loosely filled in, nothing set yet):
model = Sequential()
model.add(Masking(mask_value=0., input_shape=(timesteps, features)))
model.add(LSTM(hidden_size, return_sequences=True))
model.add(TimeDistributed(Dense(vocab_size)))
model.add(Activation('softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['categorical_accuracy'])
model.fit_generator(...) #ill figure this out
So, at the end, a softmax activation can predict the next categorical value for feature1. How do I also output a value for feature2 so that I can feed the new prediction for both features back as the next time-step? Do I need some sort of parallel architecture with two LSTMs that are combined somehow?
This is my first attempt at doing anything with neural networks or Keras, and I would not say I'm "great" at python, I can get by though. However, I feel I have a decent grasp at the fundamental theoretical concepts, but lack the practice.
This question is sorta open ended, with encouragement to pick apart my current strategy.
Once again, the overall goal is to predict both features (categorical, numeric) in order to predict "full sequences" from intermediate length sequences.
Ex. I train on these padded max-len sequences, but in production I want to use this to predict the remaining part of the currently unseen time-steps, which would be variable length.
Okay, so If I understand you properly (correct me if I'm wrong) you would like to predict next features based on the current ones.
When it comes to categorical variables, you are on point, your Dense layer should output N-1 vector containing probability of each class (while we are at it, if you, by any chance, use pandas.get_dummies remember to specify argument drop_first=True, similiar approach should be employed whatever you are using for one-hot encoding).
Except those N-1 output vector for each sample, it should output one more number for numerical value.
Remember to output logits (no activation, don't use softmax at the end like you currently do). Afterwards network output should be separated into N-1 part (your categorical feature) and passed to loss function able to handle logits (e.g. in Tensorflow it is tf.nn.softmax_cross_entropy_with_logits_v2 which applies numerically stable softmax for you).
Now, your N-th element of network output should be passed to different loss, probably Mean Squared Error.
Based on loss value of those two losses (you could take a mean of both to obtain one loss value), you backpropagate through the network and it might do just fine.
Unfortunately I'm not skilled enough in Keras in order to help you with the code, but I think you will figure it out yourself. While we're at it, I would like to suggest PyTorch for more custom neural networks (I think yours fits this description), though it's definitely doable in Keras as well, your choice.
Additional 'maybe helpful' thought: you may check Teacher Forcing for your kind of task. More on the topic and theory behind it can be found in the outstanding Deep Learning Book and code example (though in PyTorch once again), can be found in their docs here.
BTW interesting idea, mind if I use it in connection with my current research trajectory (with kudos going to you of course)? Comment on this answer if so we can talk it out in chat.
Basically every answer I was looking for was exampled and explained in this tutorial. Absolutely great resource for trying to understand how to model multi-output networks. This one goes through a lengthy walkthrough of a multi-output CNN architecture. It only took me about three weeks to stumble upon, however.
https://www.pyimagesearch.com/2018/06/04/keras-multiple-outputs-and-multiple-losses/
I have recently started working on ECG signal classification in to various classes. It is basically multi label classification task (Total 4 classes). I am new to Deep Learning, LSTM and Keras that why i am confused in few things.
I am thinking about giving normalized original signal as input to the network, is this a good approach?
I also need to understand training input shape for LSTM as ECG signals are of variable length (9000 to 18000 samples) and usually classifier need fixed variable input. How can i handle such type of input in case of LSTM.
Finally what should be structure of deep LSTM network for such lengthy input and how many layers should i use.
Thanks for your time.
Regards
I am thinking about giving normalized original signal as input to the network, is this a good approach?
Yes this is a good approach. It is actually quite standard for Deep Learning algorithms to give them your input normalized or rescaled.
This usually helps your model converge faster, as now you are inside smaller range (i.e.: [-1, 1]) instead of greater un-normalized ranges from your original input (say [0, 1000]). It also helps you get better, more precise results, as it helps solve problems like the vanishing gradient as well as adapting better to modern activation and optimizer functions.
I also need to understand training input shape for LSTM as ECG signals are of variable length (9000 to 18000 samples) and usually classifier need fixed variable input. How can i handle such type of input in case of LSTM.
This part is really important. You are correct, LSTM expects to receive inputs with a fixed shape, one that you know beforehand (in fact, any Deep Learning layer expects fixed shape inputs). This is also explained in the keras docs on Recurrent Layers where they say:
Input shape
3D tensor with shape (batch_size, timesteps, input_dim).
As we can see, it expects your data to have a number of timesteps as well as a dimension on each one of those timesteps (batch size is usually 1). To exemplify, suppose your input data consists of elements like: [[1,4],[2,3],[3,2],[4,1]]. Then, using a batch_size of 1, the shape of your data would be (1,4,2). As you have 4 timesteps, each with 2 features.
So bottom line, you have to make sure that you pre-process you data so it has a fixed shape you can then pass to your LSTM layers. This one you will have to find out by yourself, as you know your data and problem better than we do.
Maybe you can fix the samples you obtain from your signal, discarding some and keeping others so every signal is of the same length (if you say your signals are between 9k and 18k choosing 9000 could be the logical choice, discarding samples from the others you get). You could even do some other conversion to your data in a way that you can map from inputs of 9000-18000 to a fixed size.
Finally what should be structure of deep LSTM network for such lengthy input and how many layers should i use.
This one is really quite broad and doesn't have a unique answer. It would depend on the nature of your problem, and determining those parameters a priori is not so straightforward.
What I recommend you do is to start with a simple model first, and then add layers and blocks (neurons) incrementally until you are satisfied with the results.
Try just one hidden layer first, train and test your model and check your performance. You can then add more blocks and see if your performance improved. You can also add more layers and check for the same until you are satisfied.
This is a good way to create Deep Learning models, as you will arrive to the results you want while keeping your Network as lean as possible, which in turn helps your execution time and complexity. Good luck with your coding, hope you find this useful.
I am faced with this problem:
I have to build an FFNN that has to approximate an unknown function f:R^2 -> R^2. The data in my possession to check the net is a one-dimensional R vector. I know the function g:R^2->R that will map the output of the net into the space of my data. So I would use the neural network as a filter against bias in the data. But I am faced with two problems:
Firstly, how can I train my network in this way?
Secondly, I am thinking about adding an extra hidden layer that maps R^2->R and lets the net train itself to find the correct maps and then remove the extra layer. Would this algorithm be correct? Namely, would the output be the same that I was looking for?
Your idea with additional layer is good, although the problem is, that your weights in this layer have to be fixed. So in practise, you have to compute the partial derivatives of your R^2->R mapping, which can be used as the error to propagate through your network during training. Unfortunately, this may lead to the well known "vanishing gradient problem" which stopped the development of NN for many years.
In short - you can either manually compute the partial derivatives, and given expected output in R, simply feed the computed "backpropagated" errors to the network looking for R^2->R^2 mapping or as you said - create additional layer, and train it normally, but you will have to make the upper weights constant (which will require some changes in the implementation).