I am working with time series forecasting with Keras LSTM. I take the last n_input_steps occurrences of the series and try to predict one step forward. For example, if my time series is [1, 2, 3, 4] and n_input_steps = 2, the supervised learning dataset would be:
[1,2]--> 3
[2,3]--> 4
Thus, the series to be forecast (y_true) would be [3,4].
Now I have a Keras model to predict such type of series:
model = Sequential()
model.add(LSTM(neurons, activation='relu', input_shape=(n_steps_in, 1)))
model.add(RepeatVector(1))
model.add(LSTM(neurons, activation='relu', return_sequences=True))
model.add(TimeDistributed(Dense(1)))
model.compile(optimizer='adam', loss=my_loss,run_eagerly=True)
hist=model.fit(trainX, trainY, epochs=epochs, verbose=2,validation_data=(testX,testY))
And my loss function is:
def my_loss(y_true,y_pred):
print(kbe.shape(y_true))
y_true_c = kbe.cast(y_true,'float32')
y_pred_c = kbe.cast(y_pred,'float32')
ytn = y_true_c.numpy()
print(ytn.shape)
# Do some complex calculation requiring the elements of y_true_c and y_pred_c.
# ...
return result
In my poor understanding, if I call model.fit(trainX, trainY,...) with trainX corresponding to [[1, 2], [2, 3]] (an array in the proper shape) and trainY corresponding to [3, 4], the y_true inside my_loss should be a tensor corresponding to [3, 4]. However this is not that I am finding. The print output of my loss function (the shapes of tensor and array) is:
tf.Tensor([32 1 1], shape=(3,), dtype=int32)
(32, 1, 1)
regardless of the size of the input array. And if I print the values of the array, they have no remembrance to the original values. Even if I remove all the layers of the model, keeping a bare Sequential, I get the same shapes. Therefore, I am completely lost.
Based on the comments above, I did further search and found the response there is a default batch size ruling, as pointed out by Jorge Avila. The length of 32 is the default
used by Keras. The truth data and the predicted data come in batches of this size, so I should use batch_size=len(trainX) in the call to model.fit(). Furthermore, on top of that, the data comes in shuffled, that is why it becomes even more confusing. So, I have to use shuffle=False also in model.fit().
However, as pointed out by Jakub, even with these modifications, my intended loss function will not work because Keras requires symbolic derivatives of the function, which cannot be achieved by having logic which requires the numpy values. So, I have to start from scratch with another loss function acceptable by Keras.
Keep batch_sizes in multiples of 32-1024 depeding on you're data as 2** of always works as it is a common, but you shouldn't have to use shuffle in fit as TimeseriesGenrator is where the changes need to be made not fit.
Related
I have a very simple Tensorflow 2 Keras model to do penalized logistic regression on some data. I was hoping to get the probabilties of each class, instead of just the predicted values of [0 or 1].
I think I got what I wanted, but just wanted to make sure that these numbers are what I think they are. I used the model.predict_on_batch() function from Tensorflow.keras, but the documentation just says that this provides a numpy array of predictions. However I believe I am getting probabilities, but I was hoping someone could confirm.
The model code looks like this:
feature_layer = tf.keras.layers.DenseFeatures(features)
model = tf.keras.Sequential([
feature_layer,
layers.Dense(1, activation='sigmoid', kernel_regularizer=tf.keras.regularizers.l1(0.01))
])
model.compile(optimizer='adam',
loss='binary_crossentropy',
metrics=['accuracy'])
predictions = model.predict_on_batch(validation_dataset)
print('Predictions for a single batch.')
print(predictions)
So the predictions I am getting look like:
Predictions for a single batch.
tf.Tensor(
[[0.10916319]
[0.14546806]
[0.13057315]
[0.11713684]
[0.16197902]
[0.19613355]
[0.1388464 ]
[0.14122346]
[0.26149303]
[0.12516734]
[0.1388464 ]
[0.14595506]
[0.14595506]]
Now for predictions in a logistic regression that would be an array of either 0 or 1. But since I am getting floating point values. However, I am just getting a single value when there is actually a probability that the example is a 0 and the probability that the example is a 1. So I would imagine an array of 2 probabilities for each row or example. Of course, the Probability(Y = 0) + Probability(Y = 1) = 1, so this might just be some concise representation.
So again, do the values in the array below represent probabilities that the example or Y = 1, or something else?
The values represented here:
tf.Tensor(
[[0.10916319]
[0.14546806]
[0.13057315]
[0.11713684]
[0.16197902]
[0.19613355]
[0.1388464 ]
[0.14122346]
[0.26149303]
[0.12516734]
[0.1388464 ]
[0.14595506]
[0.14595506]]
Are the probabilities corresponding to each one of your classes.
Since you used sigmoid activation on your last layer, these will
be in the range [0, 1].
Your model is very shallow (few layers) and thus these prediction probabilities are very close between classes. I suggest you add more layers.
Conclusion
To answer your question, these are probabilities but only due to your activation function selection (sigmoid). If you used tanh activation these would be in range [-1,1].
Note that these probabilities are "binary" for each class due to the use of binary_crossentropy loss - aka 10.92% that class 1 is present and 89.08% that it is not, and so on for other classes. If you want the predictions to follow probabilistic rules (sum = 1) then you should consider categorical_crossentropy.
I am given with a set of following data:
a list of nanopore signal data e.g. [88.2, 92.5, ... 101.5]
a numerical value that tells where a specific material is located within the signal data e.g. 5 => that specific thing is located in the 5th index in the corresponding signal data.
I am trying to use Keras to implement Machine Learning with bidirectional LSTM models to learn the patterns between the signal data and the numerical index value.
I have read some sample codes on internet and tried to use them, but I am facing some issues. Not even sure if my approach is appropriate.
Since most of the codes I've read had equal dimensions of inputs and labels, I made an array of zeros that matches the length of signal data, and put 1 to the index value of the numerical value.
e.g. Signal = [1, 2, 4, 5]
Label = [0, 0, 1, 0]
After such modification of the data, I tried implementing bidirectional LSTM, but I am receiving Value errors
# signal_dataset = (3912, 24160) -> 3912 sequences with 24160 length
# label_dataset = (3912, 24160) -> 3912 sequences with 24160 length.
# Conversion to 3 dimension
sig_dataset = sig_dataset.reshape(1, len(sig_dataset), len(sig_dataset[0]))
lab_dataset = lab_dataset.reshape(1, len(lab_dataset), len(lab_dataset[0]))
train_size = int(len(sig_dataset) * 0.8)
trainX, trainY = sig_dataset[:train_size], adap_dataset[:train_size]
validX, validY = sig_dataset[train_size:], adap_dataset[train_size:]
# define LSTM
model = Sequential()
model.add(Bidirectional(LSTM(512, return_sequences=True), input_shape=(24160, 1)))
model.add(TimeDistributed(Dense(1, activation='sigmoid')))
model.compile(loss='mean_squared_error',
optimizer='adam',
metrics=['acc'])
hist = model.fit(trainX,
trainY,
epochs=10,
batch_size=1,
validation_data=(validX, validY))
ValueError: Error when checking input: expected bidirectional_32_input to have 3 dimensions, but got array with shape (3184, 24160)
What am I doing wrong? Is this approach even correct? Please give me any advice
I have a dataset consisting of time series of different lengths. For instance, consider this
ts1 = np.random.rand(230, 4)
ts2 = np.random.rand(12309, 4)
I have 200 sequences in the form of list of arrays
input_x = [ts1, ts2, ..., ts200]
These time series have labels 1 if good and 0 if not. Hence my labels will be something like
labels = [0, 0, 1, 0, 1, ....]
I am building a keras model as follows:
model = keras.Sequential([
keras.layers.Conv1D(64, 3, activation='relu', input_shape=(None, 4)),
keras.layers.MaxPool1D(3),
keras.layers.Conv1D(160, 10, activation='relu'),
keras.layers.GlobalAveragePooling1D(),
keras.layers.Dropout(0.5),
keras.layers.Dense(128, activation='relu'),
keras.layers.Dense(2, activation='softmax')
])
The 4 in the input shape of the first convolution layer corresponds to the number of columns in each time series which is constant (think of it as having 4 sensors returning measurements for different operations). The objective is to classify if a time series is good or bad (0 or 1) however I am unable to figure out how to train this using keras.
Running this line
model.fit(input_x, labels, epochs=5, batch_size=1)
Returns an error
Error when checking model input: the list of Numpy arrays that you are passing to your model is not the size the model expected. Expected to see 1 array(s), but instead got the following list of 200 arrays
Even using np.array(input_x) gives an error. How can I train this model with sequences of variable lengths? I know padding is an option but that's not what I am looking for. Also, I don't want to use an RNN with a sliding window. I am really looking into a solution with 1D CNN that works with sequences of variable lengths. Any help would be so much appreciated!
When working with a time series you want to define the input to the NN as (batch_size, sequence_length, features).
Which corresponds to a input_shape=(sequence_length, 4,) in your case. You will have to decide upon a maximum sequence length that you will process for the purposes of training and generating predictions.
The inputs to the NN also need to be in the shape (batch_size, sequence_length, features).
I am using Keras for building Conv Net for the first time. My layers are as follows:
layers = [
Conv2D(8,kernel_size=(4,4),padding='same',input_shape=( 200, 180,3),kernel_initializer="glorot_normal",data_format="channels_first"),
Activation("relu"),
MaxPooling2D(pool_size=(8,8),padding='same',data_format='channels_first'),
Conv2D(16,(2,2),padding='same',kernel_initializer="glorot_normal"),
Activation("relu"),
MaxPooling2D(pool_size=(4,4),padding='same',data_format='channels_first'),
Conv2D(4,(3,3),padding='same',kernel_initializer="glorot_normal"),
Activation("relu"),
MaxPooling2D(pool_size=(2,2),padding='same',data_format='channels_first'),
Flatten(),
Dense(2,input_shape=(48,)),
Softmax(axis=-1)
]
#Edit, here is the part for compiling the model and fitting it
model = Sequential(layers)
model.compile(optimizer="adam",loss="sparse_categorical_crossentropy"
metrics=["accuracy"])
trainHistory = model.fit(x=X_train,y=Y_train,batch_size=3,epochs=1000)
My labels array is of shape (,2). But when I try to use fit on the model, it gives me the error that softmax_1 expected to have shape (1,). But I have clearly mentioned units of Dense as 2 and softmax returns output of the same dimension as the input.
So where did the 1, came from? I tried to use dummy label array of 1 dimension and it runs. So what am I doing wrong? How do I use 2 dimensional array that I have?
The problem is that you are using sparse_categorical_crossentropy as the loss function. This loss function is used when the given labels (i.e. Y_train) are encoded as integers (i.e. 0, 1, 2, ...). However, If the labels are one-hot encoded, which seems to be the case in your code, you need to use categorical_crossentropy as the loss function instead.
I am working on a simple cnn classifier using keras with tensorflow background.
def cnnKeras(training_data, training_labels, test_data, test_labels, n_dim):
print("Initiating CNN")
seed = 8
numpy.random.seed(seed)
model = Sequential()
model.add(Convolution2D(64, 1, 1, init='glorot_uniform',
border_mode='valid',input_shape=(16, 1, 1), activation='relu'))
model.add(MaxPooling2D(pool_size=(1, 1)))
model.add(Convolution2D(32, 1, 1, init='glorot_uniform',
activation='relu'))
model.add(MaxPooling2D(pool_size=(1, 1)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(64, activation='relu'))
model.add(Dense(1, activation='softmax'))
# Compile model
model.compile(loss='sparse_categorical_crossentropy',
optimizer='adam', metrics=['accuracy'])
model.fit(training_data, training_labels, validation_data=(
test_data, test_labels), nb_epoch=30, batch_size=8, verbose=2)
scores = model.evaluate(test_data, test_labels, verbose=1)
print("Baseline Error: %.2f%%" % (100 - scores[1] * 100))
# model.save('trained_CNN.h5')
return None
It is a binary classification problem, but I keep getting the message Received a label value of 1 which is outside the valid range of [0, 1) which does not make any sense to me. Any suggesstions?
Range [0, 1) means every number between 0 and 1, excluding 1. So 1 is not a value in the range [0, 1).
I am not 100% sure, but the issue could be due to your choice of loss function. For a binary classification, binary_crossentropy should be a better choice.
In the last Dense layer you used model.add(Dense(1, activation='softmax')). Here 1 restricts its value from [0, 1) change its shape to the maximum output label. For eg your output is from label [0,7) then use model.add(Dense(7, activation='softmax'))
Peculiarities of sparse categorical crossentropy
The loss function sparse_categorical_crossentropy interprets the final layer in the context of classifiers as a set of probabilities for each possible class, and the output value as the number of the class. (The Tensorflow/Keras documentation goes into a bit more detail.) So x neurons in output layer are compared against output values in the range from 0 to x-1; and having just one neuron in the output layer is an 'unary' classifier that doesn't make sense.
If it's a classification task where you want to have output data in the form from 0 to x-1, then you can keep sparse categorical crossentropy, but you need to set the number of neurons in the output layer to the number of classes you have. Alternatively, you might encode the output in a one-hot vector and use categorical crossentropy loss function instead of sparse categorical crossentropy.
If it's not a classification task and you want to predict arbitrary real-valued numbers as in a regression, then categorical crossentropy is not a suitable loss function at all.
Cray and Shaili's answer was correct!
I had a range of outcomes from 1 to 6,
and the line:
tf.keras.layers.Dense(6, activation = 'softmax')
Produced that error message, saying that things were outside of the range
[0,6). I had thought that it was a labels problem (were all values present in
both the training and validation label sets?), and was flogging them.
)
the error is in range [0,4) ,you can just add one to the number of classes(lables) .
for example change this :
layers.Dense(4)
to :
layers.Dense(5)
**same for [0,1)
I had this problem when I had labels of type "float", cast them it "int" and the problem was solved...
Another potential answer to this question is regarding the workspace. If it's not a logic/sparseness/entropy error as other answers suggest, keep reading:
If you created a workspace to hold the data as the model trained, the old workspace data can cause this error if you re-train the data with new samples, especially with a different number of folders and are using the folders as the labels for classification.
Example:
I trained my original set on:
and when I tried to retrain on the new Sample Set:
I received the error:
Received a label value of 3 which is outside the valid range of [0, 3)
This is likely because the old sample set's cached values of 4 folders versus the new sample set's 3 folders caused some kind of issue. All I know for sure is once I cleared the old information out of my workspace, and ran it again, it ran to completion. This was an isolated change after multiple failures, so I am certain it is what solved the issue.
Disclaimer: I am using C# and ML.NET, but it is still utilizing TensorFlow, which is where both of our errors were produced, so it should absolutely apply to the question.
For me issue was that the number of class passed to model was less than the actual number of class in the data. Hence model predicted -1 for most case and thus giving error as out of range.