I want to write a lambda function to use as a layer in my neural network that comes after the softmax prediction, to change it according to my needs.
The input into the layer will be of shape (1,S,X). The first value along X is the score, all the others are classes in a multi-class classification.
I want for neighbouring axis (S^n, S^n+1) that values which are close in position, so the last 10 values in X for S^n and the first 10 values in X for S^n+1 are compared using the score value of X, so that the values that belong to the smaller score are set to 0 while the others are kept.
I am new to the keras backend stuff and I am unsure how to tackle this issue, any advice is welcome
Related
I have a working CNN-LSTM model trying to predict keypoints of human bodyparts on videos.
Currently, I have four keypoints as labels right hand, left hand, head and pelvis.
The problem is that on some frames I can't see the four parts of the human that I want to label, so by default I set those values to (0,0) (which is a null coordinate).
The problem that I faced was the model taking in account those points and trying to regress on them while being in a sequence.
Thus, I removed the (0,0) points in the loss calculation and the gradient retropropagation and it works much better.
The problem is that the Four points are still predicted, so I am trying to know by any means how to make it predict a variable number of keypoints.
I thought of adding a third parameter (is it visible ?), but it will probably add some complexity and loose the model.
I think that you'll have to write a custom loss function that computes the loss between points only when the target coordinates are not null.
See PyTorch custom loss function on writing custom losses.
Something like:
def loss(outputs, labels):
err = 0
n = 0
for xo, xt in zip(outputs, labels):
if xt.values == torch.zeros(2): # null coord
continue
err += torch.nn.functional.mse_loss(xo, xt)
n += 1
return (err / n)
This is pseudo-code only! An alternative form which will avoid the loop is to have an explicit binary vector (as suggested by #leleogere) that you can then multiply by the loss on each coordinate before reducing.
I am trying to write a model that outputs a vector of length N consisting of labels -1,0 and 1. Each of the labels depicts one of three decisions for the system participants (wireless devices). So the vector depicts a system state that is then passed on to an optimization problem in the next step. Due to the fix problem formulation that is awaiting the output vector a selection of 0,1 and 2 instead is not possible.
After coming across this tanh function to supply the -1,0 and 1 values:
1.5 * backend.tanh(alpha * x) + 0.5 * (backend.tanh(-(3 / alpha) * x)) from here, I was wondering how exactly this output layer and the penultimate layer can be built to suply this vector of labels {-1,0,1}. I tried using the above function in the output layer in a simple Iris classificator. But this resulted in terrible accuracy compared to the one achieved with 0,1,2 and softmax output layer.
Thanks in advance,
with kind regards,
Yuka
It doesn't seem like the outputs are actually "numerically related", for lack of a better term. Meaning, the labels could just as well be "left", "right", "up". So I think your best bet is to have 3 output nodes in the final layer, with softmax activation function, with each of the three nodes representing each of the three labels, using a Cross entropy loss function.
If your training data currently has the target as -1/0/1, you should one-hot encode it so that each target is a vector of length 3. So label 0 might be [0,1,0]
I want to use an autoencoder for dimension reduction in Keras. The input is a table with discrete values 0,1,2,3,4 (each of these numbers show a category) in the columns. Each subject has a label 0/1 to show sick/healthy. Now I have two questions:
Which activation function should I use in the last layer? Shall I use a combination of sigmoid and ReLU?
I don't know if this kind of input variables need normalization (and if the answer is yes, how?)
Which activation function should I use in the last layer? Shall I use a combination of sigmoid and ReLU?
The activation in the last layer should be sigmoid and use binary_crossentropy loss function for training.
I don't know if this kind of input variables need normalization (and if the answer is yes, how?)
It depends on the nature of discrete values you mentioned. As you know, inputs to a neural network represents the "intensity" of each neurons; higher values mean the neuron being more intensive/active. So, categorical values as input to a NN only makes sense if they map to a continuous range. For example if excellent=3, good=2, bad=1, terrible=0, it's okay to feed these values to a NN because it makes sense to calculate f(wx+b) (intensity of the neuron) as a value of 1.5 means somewhere between bad and good.
However if the categorical values are pure nomial values without any relationship between them (for example: apple=1, orange=2, banana=3), it really doen't make sense to calculate the f(wx+b). In this case what does value 1.5 mean? For this type of data as input to a NN you should convert them to a binary encoding. For example if you have only 3 fruits you can encode this way:
apple = [1, 0, 0]
orange = [0, 1, 0]
banana = [0, 0, 1]
For this binary conversion, Keras has an utility function: to_categorical.
I've taken a quick course in neural networks to better understand them and now I'm trying them out for myself in R. I'm following this documentation of Keras.
The way I understand what is happening:
We are inputting a series of images and transforming these images to numerical matrices based on the arrangement of the pixels and colors in those pixels. We then build a neural network model to learn the pattern of these arrangements, depending on the classification (0 to 9). We then use the model to predict which class an image belongs to. I'll be honest and admit I'm not entirely sure what y_train and x_train is. I simply see it as one training and one validation set so I'm not sure what the difference between x and y is.
My question:
I've followed the steps to the T and the model runs fine and the predictions look like they do in the documentation. Ultimately, the prediction looks like this:
I take this to mean that observation 1 in x_test is predicted to be a category 7.
However, looking at x_test it looks like this:
There is a 0 in every column and row, also if I scroll further down. This is where I get confused. I'm also not sure how I view the original images to view for myself how well they are predicting them. I would eventually like to draw a number myself in paint or so and then see if the model can predict it, but for that I need to first understand what is going on. I feel I am close but I just need a little nudge!
I think if you read more about the input and output layer's dimensions, that would help.
In your example:
Input layer:
A single training example of image has two dimensions 28*28, which is then converted to a single vector of dimension 784. This acts as the input layer for the neural network.
So for m training examples, your input layer will have dimensions (m, 784). Analogically speaking (to traditional ML systems), you can imagine that each pixel of an image is converted into a feature (or x1, x2, ... x784), and your training set is a dataframe with m rows and 784 columns, which is then fed into neural network to compute y_hat = f(x1,x2,x3,...x784).
Output layer:
As an output for our neural network, we want it to predict which number it is from 0 to 9. So for a single training example, the output layer has dimension 10, representing each number from 0 to 9 and for n testing examples the output layer would be a matrix with dimension n*10.
Our y is a vector of length n which would be something like [1,7,8,2,.....] containing true value for each testing example. But to match the dimension of output layer, the y vector's dimension are converted using one-hot encoding. Imagine a length 10 vector, representing number 7 by putting 1 at 7th place and rest of the positions zeros something like [0,0,0,0,0,0,1,0,0,0].
So in your question, if you wish to see the original image, you should be able to see it before reshaping the training examples with something like image(mnist$test$x[1, , ]
Hope this helps!!
y_train are the labels and x_train is the training data, so images in this example. You need to use some kind of plotting library to plot x'es. In this example you probably are not expected to input your own drawings and if you want you would need to preprocess them in the same way as in MNIST and pass them to the model.
Let's say we have a system of ODE's that describe how X affects Y:
dXdt = -k * X
dYdt = Kin * (1 - (Vmax * X)/(Km + X)) - kout * Y
I am trying to use a neural network to input X(0), Y(0), and t, and output Y(t). I made a feed-forward network in TensorFlow, and trained it on data generated with the above equations, using Y data generated with the initial value of X being 5 and 10. The initial value of Y I left constant, using its steady-state value (the value where dYdt = 0 and X = 0). For testing, I tried initial X values in-between and outside of two values I trained with. For all testing, Y(0) was left the same.
The testing results for in-between values are very good, and the results for the outside values are pretty good as well. However, this is only over the time period that the network was trained over, say t=[0,10]. Once I try to predict a value past this time period, the predictions start to drift off.
Is there a better method of implementing the network so that I can predict the values past the training interval? Ideally, I'd like to be able to predict the return of Y to steady-state, once X has reached 0. I've been reading about using RNNs, however I need it to be trained on sparse data, where the time points aren't evenly spaced. The network I used above was able to do this, at least for the trained interval. Also, most examples of RNNs that I've seen (that aren't for language processing) rely on predicting future time points based on the previous time points, instead of in the way that I am trying to use it.
An idea I have would be to use my original network to predict the values over the trained time range (and a lot of them to create a rich data-set), and then feed that into an RNN to predict the values past the time range. Would this be a feasible idea, or are there some other methods that I could try that would work better.