I'm running a sequence-to-sequence model in TensorFlow, for which I need to extensively pad my input data (samples' length varies). Thus, any metrics calculated, it is heavily biased (if the true sample is ~10% of the data input to the model, the errors appearing there as somehow hidden by "correct" predictions on the padded part).
Thus, I'd like to calculate "true" metrics (accuracy, AUC or whathever), that takes into account the real sample only. In numpy-ish code, I'd like to do something like that:
def adjusted_metrics(y_true,y_pred):
last_index = np.nonzero(y_true)[-1] % y is padded with 0 and there is another value at the end of real y
return AUC(y_true[:last_index], y_pred[:last_index])
But, I'm pretty new to tensorflow and:
I can't do that in tensorflow code. Actually, I'm not able to find the index of the last nonzero element of y_true, when it is a Tensor. I tried casting to numpy using tensorflow.experimental.numpy (no effect actually, it still appears as a Tensor?) or calling .numpy() on a tensor (not working, despite the fact I don't have eager execution disabled). I tried masking, but it's hard for me to find the mask dimension, also due to the following point:
All my attempts also seem inappropriate in the context of batches - y_true and y_pred are of shape (None, max_length). I suppose the calculation in batches is governed by my model, but I've no idea how (and if it's possible) to change the metrics calculation to be done per sample, keeping the whole learning process in batches.
Any advice? :)
Related
I searched a lot for an answer but wasn't able to find a satisfying one.
If I understood correctly, during model.fit(), Keras prints the loss for the last batch to terminal.
If I call model.evaluate() on the training set I get the loss value for the whole set.
So, intuitively if I call model.evaluate() on a single instance of the training set I should get a value that is a fraction of the value I would get calling model.evaluate() on the whole training set. But instead I get a value that is close or even bigger by ~10 times. Any idea why?
If I understood correctly, during model.fit(), Keras prints the loss for the last batch to terminal.
Generally speaking yes, but this also depends on your verbose parameter; if it is set to 2 you are going to get one line per epoch, but if you set it to 1 you are going to get a progress bar that gives info on each batch and other things.
This also prints any other metrics you included in your model (like accuracy, MSE, etc.).
Now, I think that the intuitive behavior you expected is not quite right. First, I must say that this will also depend on your specific model and architecture, as some features like Dropout Layers could have their own specific interactions that may change the result of your loss and metrics.
The thing is that the model.evaluate() method does the calculations in batches, as specified in the docs. Two important arguments are batch_size, which is the number of samples per evaluation step, and steps, which are the number of steps (batches) to finish the evaluation.
Digging a bit on the source code of model.evaluate() we can see that it averages the loss and other metrics returned by the steps or num_samples of you batch size.
This means that if you pass only one sample the result you get will be divided by 1 (and get higher values), whereas if you provide more samples (like your test data) the results will be averaged by numbers greater than 1, thus obtaining "smaller" values than the ones given with fewer samples.
I have a Tensorflow question regarding Reinforcement Learning. I have everything working and training, but there is something that feels redundant. Want to point it out and hear your thoughts:
Lets assume something simple like episodic REINFORCE. Given this standard setup:
state -> network -> logits
When I want to train (when episode complete), I need to:
pass in an array of states (saved from running the episode) to a TF Placeholder
do a forward pass with those saved states to produce logits
compute log_probs (using saved array of actions)
compute loss (using saved array of advantages)
This works fine. However, what seems redundant is steps 1&2. I'd prefer to calculate log_probs during each step of episode while episode is being run. This way I don't have to do steps 1,2,3 during training, and forward pass is only performed once (during episode). I'd have my log_probs calculated by the time the episode was over.
However, if I create placeholders for log_probs and advantages, and don't pass in states (for the redundant forward prop), then I don't know how to get TF to know where the variables are for backprop. I get the error:
ValueError: No gradients provided for any variable
So my questions are:
if I'm passing in states, is it true that forward prop is being run again during training?
can I prevent this by my method above, finding some way to tell TF where the gradients are?
In case anyone wants to see actual code (I tried to be clear enough not to need it), here is a gist of the script in question
EDIT: I think the answer has something to do with using optimzer.compute_gradients (where I can pass in variables) and optimizer.apply_gradients, but not sure how yet...
I'm using keras to solve a multi-class problem. My data is very unbalanced, so I'm trying to create something similar to a confusion matrix. My dataset is very large, and saved as HDF5, so I use HDF5Matrix to fetch the X and Y, making scikit-learn confusion matrix irrelevant (as far as I know).
I've seen it is possible to save the predictions and true labels, or output the error per label, however a more elegant solution would be to create a multi-dimensional metric that accumulates the (predicted,true) label pairs (sort of like a confusion matrix).
I have used the following callback to try and peek into what's going on per batch / epoch:
from keras.callbacks import LambdaCallback
batch_print_callback = LambdaCallback(on_batch_end=lambda batch, logs:
print(logs),on_epoch_end=lambda epoch, logs: print(logs))
but it only accumulates a single value (usually the average of sorts).
I've also tried to see if it's possible to return the y_pred / y_true as following (to try and see if I can print a multi-dimensional value in the logs):
def pred(y_true, y_pred):
return y_pred
def true(y_true, y_pred):
return y_true
However, it doesn't return a multi-dimensional value as I expected
So basically, my question is, can I use keras to accumulate multi-dimensional metric?
Well, to my best knowledge, it is not possible, since before returning the value of a tensor, K.mean is applied. I posted an issue about this on keras github.
The best design I came up with is a metric for each cell in the confusion matrix, and a callback that collects them, inpired by the thread mentioned in the question.
A sort-of working solution can be found here
I have recently started working on ECG signal classification in to various classes. It is basically multi label classification task (Total 4 classes). I am new to Deep Learning, LSTM and Keras that why i am confused in few things.
I am thinking about giving normalized original signal as input to the network, is this a good approach?
I also need to understand training input shape for LSTM as ECG signals are of variable length (9000 to 18000 samples) and usually classifier need fixed variable input. How can i handle such type of input in case of LSTM.
Finally what should be structure of deep LSTM network for such lengthy input and how many layers should i use.
Thanks for your time.
Regards
I am thinking about giving normalized original signal as input to the network, is this a good approach?
Yes this is a good approach. It is actually quite standard for Deep Learning algorithms to give them your input normalized or rescaled.
This usually helps your model converge faster, as now you are inside smaller range (i.e.: [-1, 1]) instead of greater un-normalized ranges from your original input (say [0, 1000]). It also helps you get better, more precise results, as it helps solve problems like the vanishing gradient as well as adapting better to modern activation and optimizer functions.
I also need to understand training input shape for LSTM as ECG signals are of variable length (9000 to 18000 samples) and usually classifier need fixed variable input. How can i handle such type of input in case of LSTM.
This part is really important. You are correct, LSTM expects to receive inputs with a fixed shape, one that you know beforehand (in fact, any Deep Learning layer expects fixed shape inputs). This is also explained in the keras docs on Recurrent Layers where they say:
Input shape
3D tensor with shape (batch_size, timesteps, input_dim).
As we can see, it expects your data to have a number of timesteps as well as a dimension on each one of those timesteps (batch size is usually 1). To exemplify, suppose your input data consists of elements like: [[1,4],[2,3],[3,2],[4,1]]. Then, using a batch_size of 1, the shape of your data would be (1,4,2). As you have 4 timesteps, each with 2 features.
So bottom line, you have to make sure that you pre-process you data so it has a fixed shape you can then pass to your LSTM layers. This one you will have to find out by yourself, as you know your data and problem better than we do.
Maybe you can fix the samples you obtain from your signal, discarding some and keeping others so every signal is of the same length (if you say your signals are between 9k and 18k choosing 9000 could be the logical choice, discarding samples from the others you get). You could even do some other conversion to your data in a way that you can map from inputs of 9000-18000 to a fixed size.
Finally what should be structure of deep LSTM network for such lengthy input and how many layers should i use.
This one is really quite broad and doesn't have a unique answer. It would depend on the nature of your problem, and determining those parameters a priori is not so straightforward.
What I recommend you do is to start with a simple model first, and then add layers and blocks (neurons) incrementally until you are satisfied with the results.
Try just one hidden layer first, train and test your model and check your performance. You can then add more blocks and see if your performance improved. You can also add more layers and check for the same until you are satisfied.
This is a good way to create Deep Learning models, as you will arrive to the results you want while keeping your Network as lean as possible, which in turn helps your execution time and complexity. Good luck with your coding, hope you find this useful.
My first time using Tensorflow on the MNIST dataset, I had a really simple bug where I forgot to take mean of my error values before passing it to the optimizer.
In other words, instead of
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=y, labels=y_))
I accidentally used
loss = tf.nn.softmax_cross_entropy_with_logits(logits=y, labels=y_)
Not taking the mean or sum of the error values threw no errors when training the network, however. This led me thinking: Is there actually a case when someone would need to pass in multiple loss values into an optimizer? What was happening when I passed in a Tensor not of size [1] into minimize()?
They are being added up. This is side-product of TensorFlow using Reverse Mode AD to differentiate, which requires loss to be a scalar