I'm having some issues with the predict function predicting all 0's or all 1's from my model. Here is my model
model = keras.Sequential(
[
layers.BatchNormalization(),
layers.Dense(200, activation="relu"),
layers.Dense(500, activation="relu"),
layers.Dense(1300, activation="relu"),
layers.Dense(2000, activation="relu"),
layers.Dense(1320, activation="relu"),
layers.Dense(710, activation="relu"),
layers.Dense(150, activation="relu"),
layers.Dense(30, activation="relu"),
layers.BatchNormalization(),
layers.Dense(1, activation="sigmoid"),
]
)
model.compile(loss='binary_crossentropy', optimizer=keras.optimizers.Adam(learning_rate=(0.001)), metrics=[metrics])
history = model.fit(training, target, batch_size=2048, epochs=100, shuffle=True, validation_split=0.2)
I'm very new to deep learning and trying to create models to classify and get predictions. My answer is just based on a 0 or 1 which will say if a customer is going to leave or stay as a customer in the long run. I've tested the data for null and NaN.
I've looked at a lot of posts about what this could be, and for the most part it seems that people were using the wrong activation function for a classification instead of regression problem. And the answer was that if you're using binary crossentropy, you should be using sigmoid (Why does a binary Keras CNN always predict 1?). I thought the output of my network would be correct seeing that I am using ReLu and SigMoid with binary crossentropy but whenever I predict, it's persistent in being all 0's or all 1's. The layers might not make too much sense, I'm still very new at this and playing around to see how layers are affecting the results of when I train and evaluate.
Here is roughly how I am using predict with the data
data = pd.read_csv("judge.csv", skiprows=range(0,0))
samples_to_predict = data.drop(['Surname', 'CreditScore', 'Geography', 'Gender', 'Tenure', 'NumOfProducts', 'HasCrCard', 'EstimatedSalary'], axis=1)
prediction = loaded_model.predict(samples_to_predict.values)
print(prediction)
I've been trying to debug this for a while and any help as to which direction to error could be coming from would be welcomed. I've tried increasing my epoch to 1000, I tried lowering my learning_rate, I believe BatchNormalization might take care of not scaling my data(I might be misunderstanding that), tried lowering my batch_size, I tried simply using 3 Dense layers being two ReLu and one Sigmoid, checked that the data I'm predicting is a numpy array and they've all so far produced the same result of predict outputting all 0's or all 1's.
Turns out I was using predict with categories of the data that I had not trained the model with. For example I have a column titled CustomerID that I dropped to train the model, but when I was predicting, I had forgotten to drop that column which made my model predict all 0's or all 1's. After fixing that issue and making sure that I am using only the categories that I trained it with, to predict, got me predictions that were not all 0's or all 1's.
Related
I'm working on a binary classification model for leaves from the Swedish leaves data and thought Transfer Learning could be practical. I found this tutorial, but in the compile function, I want to use different metrics than accuracy. When I try to get AUC or FP/FN/TP/TN, ValueError is raised, claiming the shape of true y (None, 1) and the shape of the y_pred (None, 2) are incompatible.
I fail to understand:
why would y_pred have this shape?
how can the accuracy be calculated, but not the parts of the confusion matrix?!
A solution without a reasoned explanation is also very welcome :)
feature_extractor_model = "https://tfhub.dev/google/tf2-preview/mobilenet_v2/feature_vector/4"
pretrained_model_without_top_layer = hub.KerasLayer(
feature_extractor_model, input_shape=(224, 224, 3), trainable=False)
classes_num = 2
model = tf.keras.Sequential([
pretrained_model_without_top_layer,
tf.keras.layers.Dense(classes_num)
])
model.compile(
optimizer="adam",
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=[['acc'], [tf.keras.metrics.TruePositives(), tf.keras.metrics.FalsePositives(), tf.keras.metrics.TrueNegatives(), tf.keras.metrics.FalseNegatives()]])
model.fit(X_train_scaled, y_train, steps_per_epoch=9, epochs=5)
If you have two classes (e.g. cats and dogs) you could either encode it sparsely as zero or one, or one-hot as [0,1] and [1,0].
Your training data is sparsely, so your loss is SparseCCE. Metrics are just losses functionally, so any metric you use would need to accept sparse. In your case, just write a "custom" loss function that accept a sparse y_true, one-hots it, and passes it to the recall/precision/etc metric function.
I have started some exploration into tensorflow and it's modelling capabilities.
I have a number of normalised 400x500 images stored as numpy arrays.
These are organised as:
180 for training category A,
20 for testing category A,
50 for training category B, and
11 for testing category.
For the moment I am using the introductory model:
model = keras.Sequential([
keras.layers.Flatten(input_shape=(400, 500)),
keras.layers.Dense(12, activation=tf.nn.relu),
keras.layers.Dense(8, activation=tf.nn.sigmoid)
keras.layers.Dense(2, activation=tf.nn.softmax)])
model.compile(optimizer=tf.train.AdamOptimizer(),
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
model.fit(train_images, train_labels, epochs=5, batch_size = 5)
train_images contains [0-199] category A and [200-249] category B images.
train_labels contains the respective labels.
During execution, accuracy always sits at around 0.78, irrespective of the number of epochs used. Loss also does not change.
0.78 seems to be close to the ratio of imaged between category A and B.
I would appreciate any assistance to help get going.
Thank you.
I think your model is too simple. You have three dense layers with just a few neurons, that does not seem like something that would be able to recognize some complex features in input of size 200000. You should try first a few convolutional layers.
I'm trying the get a hang of keras and I'm trying to get basic time series prediction working. My input is a list of random ints between 0 and 10 such as:[1,3,2,4,7,5,9,0] and my labels are the same as the input but delayed such as: [X,X,1,3,2,4,7,5] and I'm trying to have my model learn this relationship of remembering past data points.
My code is:
labels = keras.utils.to_categorical(output, num_keys)
model = keras.Sequential([
keras.layers.LSTM(10),
keras.layers.Dense(10, activation='relu'),
keras.layers.Dense(10, activation=tf.nn.softmax)
])
model.compile(optimizer=tf.train.AdamOptimizer(),
loss=tf.keras.losses.categorical_crossentropy,
metrics=['accuracy'])
model.fit(input, labels, epochs=30, verbose=2,shuffle=False)
and I get the error:ValueError: Please provide as model inputs either a single array or a list of arrays. You passed: x=[7, 6,...
I've tried reformating my input with:
input=numpy.array([[i,input[i]]for i in range(len(input))])
input=numpy.reshape(input,input.shape+(1,))
and adding input_shape=input.shape[1:] to my LSTM layer and that throws no errors but the accuracy is no better then just blind guessing
This seems like that kind of thing that could be trivial but I'm clearly missing something.
With keras.layers.LSTM(10), you need to include the input data shape: keras.layers.LSTM(10, input_shape = (input.shape[1], input.shape[2])).
Keras is expecting the input data shaped as [instances, time, predictors] and since you don't have any additional predictors, you may need to reshape your input data to input.reshape(input.shape[0], input.shape[1], 1).
Keras will infer the data shapes for the next layers, but the first layer needs the input shape defined.
While trying to implement an LSTM network for trajectory classification, I have been struggling to get decent classification results even for simple trajectories. Also, my training accuracy keeps fluctuating without increasing significantly, this can also be seen in tensorboard:
Training accuracy:
This is my model:
model1 = Sequential()
model1.add(LSTM(8, dropout=0.2, return_sequences=True, input_shape=(40,2)))
model1.add(LSTM(8,return_sequences=True))
model1.add(LSTM(8,return_sequences=False))
model1.add(Dense(1, activation='sigmoid'))`
and my training code:
model1.compile(optimizer='adagrad',loss='binary_crossentropy', metrics=['accuracy'])
hist1 = model1.fit(dataScatter[:,70:110,:],outputScatter,validation_split=0.25,epochs=50, batch_size=20, callbacks = [tensorboard], verbose = 2)
I think the problem is probably due to the data input and output shape, since the model itself seems to be fine. The Data input has (2000,40,2) shape and the output has (2000,1) shape.
Can anyone spot a mistake?
Try to change:
model1.add(Dense(1, activation='sigmoid'))`
to:
model1.add(TimeDistributed(Dense(1, activation='sigmoid')))
The TimeDistributed applies the same Dense layer (same weights) to the LSTMs outputs for one time step at a time.
I recommend this tutorial as well https://machinelearningmastery.com/timedistributed-layer-for-long-short-term-memory-networks-in-python/ .
I was able to increase the accuracy to 97% with a few adjustments that were data related. The main obstacle was an unbalanced dataset split for the training and validation set. Further improvements came from normalizing the input trajectories. I also increased the number of cells in the first layer.
I am new to neural networks and have two, probably pretty basic, questions. I am setting up a generic LSTM Network to predict the future of sequence, based on multiple Features.
My training data is therefore of the shape (number of training sequences, length of each sequence, amount of features for each timestep).
Or to make it more specific, something like (2000, 10, 3).
I try to predict the value of one feature, not of all three.
Problem:
If I make my Network deeper and/or wider, the only output I get is the constant mean of the values to be predicted. Take this setup for example:
z0 = Input(shape=[None, len(dataset[0])])
z = LSTM(32, return_sequences=True, activation='softsign', recurrent_activation='softsign')(z0)
z = LSTM(32, return_sequences=True, activation='softsign', recurrent_activation='softsign')(z)
z = LSTM(64, return_sequences=True, activation='softsign', recurrent_activation='softsign')(z)
z = LSTM(64, return_sequences=True, activation='softsign', recurrent_activation='softsign')(z)
z = LSTM(128, activation='softsign', recurrent_activation='softsign')(z)
z = Dense(1)(z)
model = Model(inputs=z0, outputs=z)
print(model.summary())
model.compile(loss='mean_squared_error', optimizer='adam', metrics=['accuracy'])
history= model.fit(trainX, trainY,validation_split=0.1, epochs=200, batch_size=32,
callbacks=[ReduceLROnPlateau(factor=0.67, patience=3, verbose=1, min_lr=1E-5),
EarlyStopping(patience=50, verbose=1)])
If I just use one layer, like:
z0 = Input(shape=[None, len(dataset[0])])
z = LSTM(4, activation='soft sign', recurrent_activation='softsign')(z0)
z = Dense(1)(z)
model = Model(inputs=z0, outputs=z)
print(model.summary())
model.compile(loss='mean_squared_error', optimizer='adam', metrics=['accuracy'])
history= model.fit(trainX, trainY,validation_split=0.1, epochs=200, batch_size=32,
callbacks=[ReduceLROnPlateau(factor=0.67, patience=3, verbose=1, min_lr=1E-5),
EarlyStopping(patience=200, verbose=1)])
The predictions are somewhat reasonable, at least they are not constant anymore.
Why does that happen? Around 2000 samples not that many, but in the case of overfitting, I would expect the predictions to match perfectly...
EDIT: Solved, as stated in the comments, it's just that Keras always expects Batches: Keras
When I use:
`test=model.predict(trainX[0])`
to get the prediction for the first sequence, I get an dimension error:
"Error when checking : expected input_1 to have 3 dimensions, but got array with shape (3, 3)"
I need to feed in an array of sequences like:
`test=model.predict(trainX[0:1])`
This is a workaround, but I am not really sure, whether this has any deeper meaning, or is just a syntax thing...
This is because you have not normalised input data.
Any neural network model will initially have weights normalised around zero. Since your training dataset has all positive values, the model will try to adjust its weights to predict only positive values. However, the activation function (in your case softsign) will map it to 1. So the model can do nothing except adding the bias. That is why you are getting an almost constant line around the average value of the dataset.
For this, you can use a general tool like sklearn to pre-process your data. If you are using pandas dataframe, something like this will help
data_df = (data_df - data_df.mean()) / data_df.std()
Or to have the parameters in the model, you can consider adding batch normalization layer to your model