Keras outputs probabilities instead of classes - python

I am trying to implement a simple neural network for multi-class classification in Keras. The code is:
model = Sequential()
model.add(Dense(512, input_dim = 55 , kernel_regularizer=l2(0.00001),
activation = 'relu'))
model.add(Dense(8, activation = 'softmax'))
model.compile(loss = 'categorical_crossentropy' , optimizer = 'adam' , metrics = ['accuracy'] )
model.fit(X_train, dummy_y, epochs = 20, batch_size = 30, class_weight=class_weights)
I have 55 features and I want to predict one of 8 classes (0,1,2,3,4,5,6,7). I also encode y_train like this:
encoder = LabelEncoder()
encoder.fit(y_train)
encoded_Y = encoder.transform(y_train)
# convert integers to dummy variables (i.e. one hot encoded)
dummy_y = np_utils.to_categorical(encoded_Y)
However, when I use predict() the output is the an array for the probability of each class:
array([[3.3881092e-01, 2.6201099e-06, 1.9504215e-03, ..., 7.0641324e-02,
4.4026113e-01, 1.2641836e-02],
[2.3457911e-02, 5.5409328e-04, 2.8759112e-05, ..., 2.1585675e-03,
5.5625242e-01, 1.0208529e-01],
[4.6981460e-01, 2.0882198e-05, 1.4895502e-01, ..., 1.3179567e-01,
2.2908358e-01, 1.4160757e-03],
...
How should I modify the network in order to output the class with the highest probability? Like this:
[[0,5,7,3,2,0,0,.....]]

You can simply use the predict_classes method:
preds_classes = model.predict_classes(X_test)
Those numbers you see as the output of predict method are the probability or confidence score of each class. Therefore, as an alternative solution, you can take the index of the maximum score which corresponds to the predicted class:
import numpy as np
probs = model.predict(X_test)
classes = np.argmax(probs, axis=-1)

Related

How input should relate/map to label y if Keras Model.fit() is given a list of input train arrays>

I am trying to work with a Deep learning model in two of the following scenarios, where two different inputs are given. I want to achieve following:
Train two models (with different weights but same architecture) with same input and concatenate the result. So in model.fit(), I am passing just the trainX value. Code is given below. It works fine.
def create_model(input_tensor):
x= Conv1D(filters = 16, kernel size=6, strides = 5, kernel_initializer = "uniform", activation = "relu")(input_tensor)
x= GlobalMaxPooling1D()(x)
x = Dense(2,activation ='softmax')()
return x
dataframe = pd.read_csv(Filename, index_col=0)
X= dataframe.values[:,:].astype(float)
Y = dataframe.values[:,1]
trainx, testx, trainy, testy = train_test_split(X,Y, test_Szie= 0.2, random_state=200, shuffle =True)
input_shape = (33000,1)
input_tensor = Input(input_shape)
pred_a = create_model(input_tensor)
pred_b = create_model(input_tensor)
out = keras.layers.Multiply()([pred_a, pred_b])
model =Model(inputs=(input_tensor), outputs=out)
model.compile(loss='categorical_crossentropy', optimizer= 'Adam', metrics =['accuracy'])
histroy = model.fit(trainX, trainy)
Train same model (with same weights) twice but with different inputs. I am confused how to pass inputs in this case. In normal cases, we have equal number of instances in both trainX and trainy data. If I pass a list like model.fit([x_train_1, x_train_2], trainy), then the number of instances of combined x_train_1, x_train_2 will be double than y. How trainy corresponds to the input trainx in this case?
The input and corresponding output of a model have shapes as X = (batch_size, ....) , y = (batch_size,....)
In case of multiple inputs, you can define multiple input layers and feed them to your different model instances as follows
inp_A = Input(shape=(...))
inp_B = Input(shape=(...))
pred_A = create_model(inp_A)
pred_B = create_model(inp_B)
*** Other layers and code ****
model = Model(inputs=[inp_A, inp_B], outputs=out)
*** Other code ***
Then you can call model.fit with passing a list of inputs and a single output.

How to set up LSTM network for predict multi-sequence?

I am learning how to set up the RNN-LSTM network for prediction. I have created the dataset with one input variable.
x y
1 2.5
2 6
3 8.6
4 11.2
5 13.8
6 16.4
...
By the following python code, I have created the window data, like [x(t-2), x(t-1), x(t)] to predict [y(t)]:
df= pd.read_excel('dataset.xlsx')
# split a univariate dataset into train/test sets
def split_dataset(data):
train, test = data[:-328], data[-328:-6]
return train, test
train, test = split_dataset(df.values)
# scale train and test data
def scale(train, test):
# fit scaler
scaler = MinMaxScaler(feature_range=(0,1))
scaler = scaler.fit(train)
# transform train
#train = train.reshape(train.shape[0], train.shape[1])
train_scaled = scaler.transform(train)
# transform test
#test = test.reshape(test.shape[0], test.shape[1])
test_scaled = scaler.transform(test)
return scaler, train_scaled, test_scaled
scaler, train_scaled, test_scaled = scale(train, test)
def to_supervised(train, n_input, n_out=7):
# flatten data
data = train
X, y = list(), list()
in_start = 0
# step over the entire history one time step at a time
for _ in range(len(data)):
# define the end of the input sequence
in_end = in_start + n_input
out_end = in_end + n_out
# ensure we have enough data for this instance
if out_end <= len(data):
x_input = data[in_start:in_end, 0]
x_input = x_input.reshape((len(x_input), 1))
X.append(x_input)
y.append(data[in_end:out_end, 0])
# move along one time step
in_start += 1
return np.array(X), np.array(y)
train_x, train_y = to_supervised(train_scaled, n_input = 3, n_out = 1)
test_x, test_y = to_supervised(test_scaled, n_input = 3, n_out = 1)
verbose, epochs, batch_size = 0, 20, 16
n_timesteps, n_features, n_outputs = train_x.shape[1], train_x.shape[2], train_y.shape[1]
model = Sequential()
model.add(LSTM(200, return_sequences= False, input_shape = (train_x.shape[1],train_x.shape[2])))
model.add(Dense(1))
model.compile(loss = 'mse', optimizer = 'adam')
history = model.fit(train_x, train_y, epochs=epochs, verbose=verbose, validation_data = (test_x, test_y))
However, I have other questions about this:
Q1: What is the meaning of units in LSTM? [model.add(LSTM(units, ...))]
(I have tried different units for the model, it would be more accurate as units increased.)
Q2: How many layers should I set?
Q3: How can I predict multi-steps ? e.g base on (x(t),x(t-1)) to predict y(t), y(t+1) I have tried to set the n_out = 2 in the to_supervised function, but when I applied the same method, it returned the error
train_x, train_y = to_supervised(train_scaled, n_input = 3, n_out = 2)
test_x, test_y = to_supervised(test_scaled, n_input = 3, n_out = 2)
verbose, epochs, batch_size = 0, 20, 16
n_timesteps, n_features, n_outputs = train_x.shape[1], train_x.shape[2], train_y.shape[1]
model = Sequential()
model.add(LSTM(200, return_sequences= False, input_shape = (train_x.shape[1],train_x.shape[2])))
model.add(Dense(1))
model.compile(loss = 'mse', optimizer = 'adam')
history = model.fit(train_x, train_y, epochs=epochs, verbose=verbose, validation_data = (test_x, test_y))
ValueError: Error when checking target: expected dense_27 to have shape (1,) but got array with shape (2,)
Q3(cont): What should I add or change in the model setting?
Q3(cont): What is the return_sequences ? When should I set True?
Q1. Units in LSTM is the number of neurons in your LSTM layer.
Q2. That depends on your model / data. Try changing them around to see the effect.
Q3. That depends which apporach you take.
Q4. Ideally you'll want to predict a single time step every time.
It is possible to predict several at a time, but in my experience you will get better results like as i have described below
e.g
use y(t-1), y(t) to predict y_hat(t+1)
THEN
use y(t), y_hat(t+1) to predict y_hat(t+2)
Are you sure you're actually using X to predict Y in this case?
how does train x/y and test x/y look like?
Re Q1: It is the number of LSTM cells (=LSTM units), which consist of several neurons themselves but have (in the standard case as given) only one output each. Thus, the number of units corresponds directly to the dimensionality of your output.

tf.keras.Sequential binary classification model predicting [0.5, 0.5] or close to

I am currently trying to build a model to classify whether or not the outcome of a given football match will be above or below 2.5 goals, based on the Home team, Away team & game league, using a tf.keras.Sequential model in TensorFlow 2.0RC.
The problem I am encountering is that my softmax results converge on [0.5,0.5] when using the model.predict method. What makes this odd is that my validation & test accuracy and losses are about 0.94 & 0.12 respectively after 1000 epochs of training, otherwise I would have put this down to an overfitting problem. I am aware that 1000 epochs is extremely likely to overfit, however, I want to understand why my accuracy increases until about 800 epochs in. My loss flattens at about 300 epochs.
I have tried to alter the number of layers, number of units in each layer, the activation functions, optimizers and loss functions, number of epochs and learning rates, but can only seem to increase the losses.
The results still seem to converge toward [0.5,0.5] regardless.
The full code can be viewed at https://github.com/AhmUgEk/tensorflow_football_predictions, but below is an extract showing model composition.
# Create Keras Sequential model:
model = keras.Sequential()
model.add(feature_layer) # Input processing layer.
model.add(Dense(units=32, activation='relu')) # Hidden Layer 1.
model.add(Dropout(rate=0.4))
model.add(BatchNormalization())
model.add(Dense(units=32, activation='relu')) # Hidden Layer 2.
model.add(Dropout(rate=0.4))
model.add(BatchNormalization())
model.add(Dense(units=2, activation='softmax')) # Output layer.
# Compile the model:
model.compile(
optimizer=keras.optimizers.Adam(learning_rate=0.0001),
loss=keras.losses.MeanSquaredLogarithmicError(),
metrics=['accuracy']
)
# Compile the model:
model.compile(
optimizer=keras.optimizers.Adam(learning_rate=0.0001),
loss=keras.losses.MeanSquaredLogarithmicError(),
metrics=['accuracy']
)
# Fit the model to the training dataset and validate against the
validation dataset between epochs:
model.fit(
train_dataset,
validation_data=val_dataset,
epochs=1000,
callbacks=[tensorboard_callback]
)
I would expect to receive a result of [0.282, 0.718] for example for an input of:
model.predict_classes([np.array(['E0'], dtype='object'),
np.array(['Liverpool'], dtype='object'),
np.array(['Newcastle'], dtype='object')])[0]
but as per the above, receive a result of say [0.5, 0.5].
Am I missing something obvious here?
I had made some minor changes in the model. Now, I am not getting exactly [0.5, 0.5].
Result:
[[0.61482537 0.3851746 ]
[0.5121426 0.48785746]
[0.48058605 0.51941395]
[0.48913187 0.51086813]
[0.45480043 0.5451996 ]
[0.48933673 0.5106633 ]
[0.43431875 0.5656812 ]
[0.55314165 0.4468583 ]
[0.5365097 0.4634903 ]
[0.54371756 0.45628244]]
Implementation:
import datetime
import os
import numpy as np
import pandas as pd
import tensorflow as tf
from gpu_limiter import limit_gpu
from pipe_functions import csv_to_df, dataframe_to_dataset
from sklearn.model_selection import train_test_split
from tensorflow import keras
from tensorflow.keras.layers import BatchNormalization, Dense, DenseFeatures, Dropout, Input
from tensorflow.keras.callbacks import TensorBoard, ModelCheckpoint
import tensorflow.keras.backend as K
from tensorflow.data import Dataset
# Test GPU availability and instantiate memory growth limitation if True:
if tf.test.is_gpu_available():
print('GPU Available\n')
limit_gpu()
else:
print('Running on CPU')
df = csv_to_df("./csv_files")
# Format & organise imported data, making the "Date" column the new index:
df['Date'] = pd.to_datetime(df['Date'])
df = df[['Date', 'Div', 'HomeTeam', 'AwayTeam', 'FTHG', 'FTAG']].dropna().set_index('Date').sort_index()
df['Over_2.5'] = (df['FTHG'] + df['FTAG'] > 2.5).astype(int)
df = df.drop(['FTHG', 'FTAG'], axis=1)
# Split data into training, validation and testing data:
# Note: random_state variable set to ensure reproducibility.
train, test = train_test_split(df, test_size=0.05, random_state=42)
train, val = train_test_split(train, test_size=0.05, random_state=42)
# print(df['Over_2.5'].value_counts()) # Check that data is balanced.
# Create datasets from train, val & test dataframes:
target_col = 'Over_2.5'
batch_size = 32
def df_to_dataset(features: np.ndarray, labels: np.ndarray, shuffle=True, batch_size=8) -> Dataset:
ds = Dataset.from_tensor_slices(({"feature": features}, {"target": labels}))
if shuffle:
ds = ds.shuffle(buffer_size=len(features))
ds = ds.batch(batch_size)
return ds
def get_feature_transform() -> DenseFeatures:
# Format features into feature columns to ensure data is in the correct format for feeding into the model:
feature_cols = []
for column in filter(lambda x: x != target_col, df.columns):
feature_cols.append(tf.feature_column.embedding_column(tf.feature_column.categorical_column_with_vocabulary_list(
key=column, vocabulary_list=df[column].unique()), dimension=5))
return DenseFeatures(feature_cols)
# Transforms all features into dense tensors.
feature_transform = get_feature_transform()
train_features = feature_transform(dict(train)).numpy()
val_features = feature_transform(dict(val)).numpy()
test_features = feature_transform(dict(test)).numpy()
train_dataset = df_to_dataset(train_features, train[target_col].values, shuffle=True, batch_size=batch_size)
val_dataset = df_to_dataset(val_features, val[target_col].values, shuffle=True, batch_size=batch_size) # Shuffle not required to validation data.
test_dataset = df_to_dataset(test_features, test[target_col].values, shuffle=True, batch_size=batch_size) # Shuffle not required to test data.
# Create Keras Functional API:
# Create a feature layer from the feature columns, to be placed at the input layer of the model:
def build_model(input_shape: tuple) -> keras.Model:
input_layer = keras.Input(shape=input_shape, name='feature')
model = Dense(units=1028, activation='relu', kernel_initializer='normal', name='dense0')(input_layer) # Hidden Layer 1.
model = BatchNormalization(name='bc0')(model)
model = Dense(units=1028, activation='relu', kernel_initializer='normal', name='dense1')(model) # Hidden Layer 2.
model = Dropout(rate=0.1)(model)
model = BatchNormalization(name='bc1')(model)
model = Dense(units=100, activation='relu', kernel_initializer='normal', name='dense2')(model) # Hidden Layer 3.
model = Dropout(rate=0.25)(model)
model = BatchNormalization(name='bc2')(model)
model = Dense(units=50, activation='relu', kernel_initializer='normal', name='dense3')(model) # Hidden Layer 4.
model = Dropout(rate=0.4)(model)
model = BatchNormalization(name='bc3')(model)
output_layer = Dense(units=2, activation='softmax', kernel_initializer='normal', name='target')(model) # Output layer.
model = keras.Model(inputs=input_layer, outputs=output_layer, name='better-than-chance')
# Compile the model:
model.compile(
optimizer=keras.optimizers.Adam(learning_rate=0.001),
loss='mse',
metrics=['accuracy']
)
return model
# # Create a TensorBoard log file (time appended) directory for every run of the model:
# directory = ".\\logs\\" + str(datetime.datetime.now().strftime("%Y%m%d-%H%M%S"))
# os.mkdir(directory)
# # Create a TensorBoard callback to log a record of model performance for every 1 epoch:
# tensorboard_callback = TensorBoard(log_dir=directory, histogram_freq=1, write_graph=True, write_images=True)
# Run "tensorboard --logdir .\logs" in anaconda prompt to review & compare logged results.
# Note: Make sure that the correct environment is activated before running.
model = build_model((train_features.shape[1],))
model.summary()
# checkpoint = ModelCheckpoint('model-{epoch:03d}.h5', verbose=1, monitor='val_loss',save_best_only=True, mode='auto')
# Fit the model to the training dataset and validate against the validation dataset between epochs:
model.fit(
train_dataset,
validation_data=val_dataset,
epochs=10)
# callbacks=[checkpoint]
# Saves and reloads model.
# model.save("./model.h5")
# model_from_saved = keras.models.load_model("./model.h5")
# Evaluate model accuracy against test dataset:
# scores, accuracy = model.evaluate(train_dataset)
# print('Accuracy:', accuracy)
##############
## OPTIONAL ##
##############
# DUBUGGING
# inp = model.input # input placeholder
# outputs = [layer.output for layer in model.layers] # all layer outputs
# functors = [K.function([inp], [out]) for out in outputs] # evaluation functions
# # Testing
# layer_outs = [func([test_features]) for func in functors]
# print(layer_outs)
# # # Form a prediction based on inputs:
prediction = model.predict({"feature": test_features[:10]})
print(prediction)
One thing you can do is to try some ensemble Learning methods like
RandomForest
and
XGBoost
and compare the results.
You should try is to add other Key Performance Indicators(KPI)s in
your data and then try to fit the model.

Take accuracy of n high probability output from Keras Lstm model

I have a Lstm model for sequence prediction,which is shown here:
def create_model(max_sequence_len, total_words):
input_len = max_sequence_len - 1
model = keras.models.Sequential()
model.add(layers.Embedding(total_words, 50, input_length=input_len))
model.add(layers.LSTM(50, input_shape=predictors[:1].shape))
model.add(layers.Dropout(0.2))
model.add(layers.Dense(activation='softmax', units = total_words))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'], lr=0.01)
return model
model_sb = create_model(max_sequence_len, total_words)
history = model_sb.fit(X_train, y_train, epochs = 20 , shuffle = True, validation_split=0.3, )
and it works well but I want to take 2 output from my model who are the output with most probability in softmax dense layer.
for take them I can use this code:
predicted = model_sb.predict(test_sequence, verbose=1)
And then by this code find the first n high probability output:
y_sum = predicted.sum(axis=0)
ind = np.argpartition(y_sum, -n)[-n:]
ind[np.argsort(y_sum[ind])]
But I need to know the accuracy of my model if the output be one of these n output (with "or" condition)
Is there any package which help me?
I mean I don't want to evaluate my model with just one most probability output, I want to evaluate accuracy and loss by 2 high probability result.
This is called top-k accuracy, with k = 2 in your case. Keras already has an implementation of this accuracy:
from keras.metrics import top_k_categorical_accuracy
def my_acc(y_true, y_pred):
return top_k_categorical_accuracy(y_true, y_pred, k=2)
Then you pass this custom metric to your model:
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=[my_acc])

How do I get the predicted labels from a model.predict function from Keras?

I have built a LSTM model using Keras library to predict duplicate questions on the Quora official dataset. The test labels are 0 or 1. 1 indicates the question pair is duplicate. After building the model using model.fit, I test the model using model.predict on the test data. The output is an array of values(probabilities) like below:
[ 0.00514298]
[ 0.15161049]
[ 0.27588326]
[ 0.00236167]
[ 1.80067325]
[ 0.01048524]
[ 1.43425131]
[ 1.99202418]
[ 0.54853892]
[ 0.02514757]
I am only showing the first 10 values in the array. I don't understand what do these values mean and how do I compare it against the test labels to calculate the test accuracy. I want the model to output the binary predicted values as 0 or 1 rather than the probabilities. Please refer the last section of my code below:
sequence_1_input = Input(shape=(MAX_SEQUENCE_LENGTH,), dtype='int32')
embedded_sequences_1 = embedding_layer(sequence_1_input)
x1 = lstm_layer(embedded_sequences_1)
sequence_2_input = Input(shape=(MAX_SEQUENCE_LENGTH,), dtype='int32')
embedded_sequences_2 = embedding_layer(sequence_2_input)
y1 = lstm_layer(embedded_sequences_2)
merged = concatenate([x1, y1])
merged = Dropout(rate_drop_dense)(merged)
merged = BatchNormalization()(merged)
merged = Dense(num_dense, activation=act)(merged)
merged = Dropout(rate_drop_dense)(merged)
merged = BatchNormalization()(merged)
preds = Dense(1, activation='sigmoid')(merged)
########################################
## train the model
########################################
model = Model(inputs=[sequence_1_input, sequence_2_input], \
outputs=preds)
model.compile(loss='binary_crossentropy',
optimizer='nadam',
metrics=['acc'])
hist = model.fit([data_1_train, data_2_train], labels_train, \
validation_data=([data_1_val, data_2_val], labels_val, weight_val), \
epochs=200, batch_size=2048, shuffle=True, \
class_weight=class_weight, callbacks=[early_stopping, model_checkpoint])
preds = model.predict([test_data_1, test_data_2], batch_size=8192,
verbose=1)
preds += model.predict([test_data_2, test_data_1], batch_size=8192,
verbose=1)
preds /= 2
print(type(preds))
print(preds[:20])
print('preds.ravel')
print(preds.ravel())
As you say, your output is a np array with probabilities. You can convert it to binary labels by doing for example (model.predict(X) > 0.5).astype(int)
Artificial neural networks are probablisitc classfiiers, so your output is absolutly fine. It´s just the probability to belong to your target label.
In addition one interesting fact is that 0.5 is maybe not the offet you want to use. It depends on, how important true-positives and false-positives are in your task. You can take a look at the ROC Curves to find the optimal offset.
You can try changing your activation function to softmax in your last layer or you can make your own softmax function and pass your output to that function. Here's an example for a custom softmax function
def softmax(x):
return np.exp(x) / np.sum(np.exp(x), axis=0)

Categories

Resources