I'm learning keras and trying to approximate sin. Everything seems fine, my loss decreases. But the problem is that accuracy doesn't change. Here is the code:
from tensorflow import keras
from keras.layers import Dense
import numpy as np
X = np.linspace(0,10,100)
y = np.sin(X)
model = keras.Sequential([
Dense(100,input_shape=(1,), activation='linear',name='relu'),
Dense(100,activation='relu', name='layer_2'),
Dense(1,activation='tanh', name='layer_3')
])
model.compile(loss='mse',optimizer='adam',metrics=['accuracy'])
model.summary()
history = model.fit(X,y,epochs=100, batch_size = 2,verbose=0,validation_split=0.2)
And here are the graphics which I got:
With loss='mse', the accuracy is going to be weirded out since it is not a classification task. Normally when doing a regression task you do not put metrics=['accuracy']. I suggest to not worry about it, as your loss is decreasing.
Related
I came across a weird difference between keras model.fit() and sklearn model.fit() functions. When model.fit() is called inside a loop I get inconsistent predictions using a Keras sequential model. This is not the case when using an sklearn model. See sample code to reproduce the phenomenon.
from numpy.random import seed
seed(1337)
import tensorflow as tf
tf.random.set_seed(1337)
from sklearn.linear_model import LogisticRegression
from keras.models import Sequential
from keras.layers import Dense, Dropout
from keras.layers import InputLayer
from sklearn.datasets import make_blobs
from sklearn.preprocessing import MinMaxScaler
import numpy as np
def get_sequential_dnn(NUM_COLS, NUM_ROWS):
# code for model
if __name__ == "__main__":
input_size = 10
X, y = make_blobs(n_samples=100, centers=2, n_features=input_size,
random_state=1
)
scalar = MinMaxScaler()
scalar.fit(X)
X = scalar.transform(X)
model = get_sequential_dnn(X.shape[1], X.shape[0])
# print(model.summary())
# model = LogisticRegression()
for i in range(2):
model.fit(X, y, epochs=100, verbose=0, shuffle=False)
# model.fit(X, y)
Xnew, _ = make_blobs(n_samples=3, centers=2, n_features=10, random_state=1)
Xnew = scalar.transform(Xnew)
# make a prediction
# ynew = model.predict_proba(Xnew)[:, 1]
ynew = model.predict_proba(Xnew)
ynew = np.array(ynew)
# show the inputs and predicted outputs
print('--------------')
for i in range(len(Xnew)):
print("X=%s \n Predicted=%s" % (Xnew[i], ynew[i]))
The output of this is
--------------
X=[0.32799209 0.32682211 0.62699485 0.89987274 0.59894281 0.94662653
0.77125788 0.73345369 0.2153754 0.35317172]
Predicted=[0.9931685]
X=[0.60876924 0.33208319 0.24770841 0.11435312 0.66211608 0.17361879
0.12891829 0.25729677 0.69975833 0.73165292]
Predicted=[0.35249507]
X=[0.65154993 0.26153846 0.2416324 0.11793901 0.7047334 0.17706289
0.07761879 0.45189967 0.8481064 0.85092378]
Predicted=[0.35249507]
--------------
X=[0.32799209 0.32682211 0.62699485 0.89987274 0.59894281 0.94662653
0.77125788 0.73345369 0.2153754 0.35317172]
Predicted=[1.]
X=[0.60876924 0.33208319 0.24770841 0.11435312 0.66211608 0.17361879
0.12891829 0.25729677 0.69975833 0.73165292]
Predicted=[0.17942095]
X=[0.65154993 0.26153846 0.2416324 0.11793901 0.7047334 0.17706289
0.07761879 0.45189967 0.8481064 0.85092378]
Predicted=[0.17942095]
While if I use a Logistic Regression (un-comment the commented lines) the predictions are consistent:
--------------
X=[0.32799209 0.32682211 0.62699485 0.89987274 0.59894281 0.94662653
0.77125788 0.73345369 0.2153754 0.35317172]
Predicted=0.929209043999009
X=[0.60876924 0.33208319 0.24770841 0.11435312 0.66211608 0.17361879
0.12891829 0.25729677 0.69975833 0.73165292]
Predicted=0.04643513037543502
X=[0.65154993 0.26153846 0.2416324 0.11793901 0.7047334 0.17706289
0.07761879 0.45189967 0.8481064 0.85092378]
Predicted=0.038716408758471876
--------------
X=[0.32799209 0.32682211 0.62699485 0.89987274 0.59894281 0.94662653
0.77125788 0.73345369 0.2153754 0.35317172]
Predicted=0.929209043999009
X=[0.60876924 0.33208319 0.24770841 0.11435312 0.66211608 0.17361879
0.12891829 0.25729677 0.69975833 0.73165292]
Predicted=0.04643513037543502
X=[0.65154993 0.26153846 0.2416324 0.11793901 0.7047334 0.17706289
0.07761879 0.45189967 0.8481064 0.85092378]
Predicted=0.038716408758471876
I get that the obvious solution to this is fit the model before the loop, probably there is a strong randomness how Keras models fit the data to the labels, but there are a couple of cases where you need to have a loop to get prediction scores. For example if you want to perform a 10-fold cross validation to get the AUC, sensitivity, specificity values on a training data. In these situations this randomness is unacceptable.
What is causing this inconsistency and what is the solution to it?
There are couple of issue with the way your are trying to make reproducible results with keras.
You are calling the fit (when i==1) over the already fitted model (when i==0). So the optimizer sees different sets of inital weights in both the cases and so you will end up in two different models. Solution: Get a fresh model everytime. This is not the case with sklearn, which starts with fresh initialized weights every time a fit is called.
model.fit internally might use a current stage of random number generator. You seeded it outside the loop, so the state will be different when fit is called the second time. Solution: Seed inside the loop.
Sample code with issue
# Issue 2 here
tf.random.set_seed(1337)
def get_model():
model = Sequential()
model.add(Dense(4, input_dim=8, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam')
return model
X = np.random.randn(10,8)
y = np.random.randn(10,1)
# Issue 1 here
model = get_model()
results = []
for i in range(10):
model.fit(X, y, epochs=5, verbose=0, shuffle=False)
results.append(np.sum(model.predict(X)))
assert np.all(np.isclose(results, results[0]))
As you can see the assert fails
Corrected code
results = []
for i in range(10):
tf.random.set_seed(1337)
model = get_model()
model.fit(X, y, epochs=5, verbose=0, shuffle=False)
results.append(np.sum(model.predict(X)))
assert np.all(np.isclose(results, results[0]))
I am an electrical engineer and I am looking for a solution to calculate the DC current of a permanent synchronous motor. So I decided to check the ANN solutions with Keras and so on.Long story short, I'll show you a screenshot of some measured signals.
The first 5 signals are the measured signals. The last one is the DC current, which I will estimate. Here the value was recorded with the help of a current clamp. Okay, I started building a model in Python and tried some things that I assume will increase the accuracy of the model. But after all that, I am not getting that good results from the model and my hope is that maybe I am choosing wrong parameters or not an ideal model for this purpose.
Here is my code:
import numpy as np
from keras.layers import Dense, LSTM
from keras.models import Sequential
from keras.callbacks import EarlyStopping
import pandas as pd
from sklearn import preprocessing
from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score
from matplotlib import pyplot as plt
import seaborn as sns
# Import input (x) and output (y) data, and asign these to df1 and df1
df = pd.read_csv('train_data.csv')
df = df[['rpm','iq','uq','udc','idc']]
X = df[df.columns[:-1]]
Y = df.idc
plt.figure()
sns.heatmap(df.corr(),annot=True)
plt.show()
# Split the data into input (x) training and testing data, and ouput (y) training and testing data,
# with training data being 80% of the data, and testing data being the remaining 20% of the data
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.2)#, shuffle=True)
# Scale both training and testing input data
X_train = preprocessing.maxabs_scale(X_train)
X_test = preprocessing.maxabs_scale(X_test)
model = Sequential()
model.add(Dense(4, input_shape=(4,)))
model.add(Dense(4, input_shape=(4,)))
model.add(Dense(1, input_shape=(4,)))
model.compile(optimizer="adam", loss="msle", metrics=['mean_squared_logarithmic_error','accuracy'])
# Pass several parameters to 'EarlyStopping' function and assign it to 'earlystopper'
earlystopper = EarlyStopping(monitor='val_loss', min_delta=0, patience=15, verbose=1, mode='auto')
model.summary()
history = model.fit(X_train, y_train, epochs = 2000, validation_split = 0.3, verbose = 2, callbacks = [earlystopper])
# Runs model (the one with the activation function, although this doesn't really matter as they perform the same)
# with its current weights on the training and testing data
y_train_pred = model.predict(X_train)
y_test_pred = model.predict(X_test)
# Calculates and prints r2 score of training and testing data
print("The R2 score on the Train set is:\t{:0.3f}".format(r2_score(y_train, y_train_pred)))
print("The R2 score on the Test set is:\t{:0.3f}".format(r2_score(y_test, y_test_pred)))
df = pd.read_csv('test_two_data.csv')
df = df[['rpm','iq','uq','udc','idc']]
X = df[df.columns[:-1]]
Y = df.idc
X_validate = preprocessing.maxabs_scale(X)
y_pred = model.predict(X_validate)
plt.plot(Y)
plt.plot(y_pred)
plt.show()
(weight_0,bias_0) = model.layers[0].get_weights()
(weight_1,bias_1) = model.layers[1].get_weights()
One limitation is that I can't use LSTM layers or other complex algorithms because I need to implement the trained model in a microcontroller on a motor application later.
I guess you could find some words for me to make my model a little better in accuracy.
At the end here is a figure where I show you the worse prediction performance. Orange is the prediction and blue is the measured current.
The training dataset was this one.
The correlation between the individual values can be found here. Since the values of id and ud have no correlation to idc, I decided to delete them.
The most important thing to keep in mind when trying to improve the accuracy of the model is ALWAYS Normalise the input data which basically means rescaling real-valued numeric attributes into the range 0 and 1. I am not able to understand the way you are providing the training data to the model. Could you please explain that. It would be better in understanding and identifying the scope of higher accuracy.
Now if we talk about parameters, I would suggest you the addition of a Tuning Algorithm for the parameters to get the optimized value of each parameter.
It is always a good parctice to include hidden layers which could provide better feature extract.
So I build a GRU model and I'm comparing 3 different datasets on the same model. I was just running the first dataset and set the number of epochs to 25, but I have noticed that my validation loss is increasing just after the 6th epoch, doesn't that indicate overfitting, am I doing something wrong?
import pandas as pd
import tensorflow as tf
from keras.layers.core import Dense
from keras.layers.recurrent import GRU
from keras.models import Sequential
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
from google.colab import files
from tensorboardcolab import TensorBoardColab, TensorBoardColabCallback
tbc=TensorBoardColab() # Tensorboard
df10=pd.read_csv('/content/drive/My Drive/Isolation Forest/IF 10 PERCENT.csv',index_col=None)
df2_10= pd.read_csv('/content/drive/My Drive/2019 Dataframe/2019 10minutes IF 10 PERCENT.csv',index_col=None)
X10_train= df10[['WindSpeed_mps','AmbTemp_DegC','RotorSpeed_rpm','RotorSpeedAve','NacelleOrientation_Deg','MeasuredYawError','Pitch_Deg','WindSpeed1','WindSpeed2','WindSpeed3','GeneratorTemperature_DegC','GearBoxTemperature_DegC']]
X10_train=X10_train.values
y10_train= df10['Power_kW']
y10_train=y10_train.values
X10_test= df2_10[['WindSpeed_mps','AmbTemp_DegC','RotorSpeed_rpm','RotorSpeedAve','NacelleOrientation_Deg','MeasuredYawError','Pitch_Deg','WindSpeed1','WindSpeed2','WindSpeed3','GeneratorTemperature_DegC','GearBoxTemperature_DegC']]
X10_test=X10_test.values
y10_test= df2_10['Power_kW']
y10_test=y10_test.values
# scaling values for model
x_scale = MinMaxScaler()
y_scale = MinMaxScaler()
X10_train= x_scale.fit_transform(X10_train)
y10_train= y_scale.fit_transform(y10_train.reshape(-1,1))
X10_test= x_scale.fit_transform(X10_test)
y10_test= y_scale.fit_transform(y10_test.reshape(-1,1))
X10_train = X10_train.reshape((-1,1,12))
X10_test = X10_test.reshape((-1,1,12))
# creating model using Keras
model10 = Sequential()
model10.add(GRU(units=512, return_sequences=True, input_shape=(1,12)))
model10.add(GRU(units=256, return_sequences=True))
model10.add(GRU(units=256))
model10.add(Dense(units=1, activation='sigmoid'))
model10.compile(loss=['mse'], optimizer='adam',metrics=['mse'])
model10.summary()
history10=model10.fit(X10_train, y10_train, batch_size=256, epochs=25,validation_split=0.20, verbose=1, callbacks=[TensorBoardColabCallback(tbc)])
score = model10.evaluate(X10_test, y10_test)
print('Score: {}'.format(score))
y10_predicted = model10.predict(X10_test)
y10_predicted = y_scale.inverse_transform(y10_predicted)
y10_test = y_scale.inverse_transform(y10_test)
plt.plot( y10_predicted, label='Predicted')
plt.plot( y10_test, label='Measurements')
plt.legend()
plt.savefig('/content/drive/My Drive/Figures/Power Prediction 10 Percent.png')
plt.show()
LSTMs(and also GRUs in spite of their lighter construction) are notorious for easily overfitting.
Reduce the number of units(the output size) in each of the layers(32(layer1)-64(layer2); you could also eliminate the last layer altogether.
The second of all, you are using the activation 'sigmoid', but your loss function + metric is mse.
Ensure that your problem is either a regression or a classification one. If it is indeed a regression, then the activation function should be 'linear' at the last step. If it is a classification one, you should change your loss_function to binary_crossentropy and your metric to 'accuracy'.
Therefore, the plot displayed is just misleading for the moment. If you modify like I suggested and you still get such a train-val loss plot, then we can state for sure that you have an overfitting case.
I am trying simple multinomial logistic regression using Keras, but the results are quite different compared to standard scikit-learn approach.
For example with iris data:
import numpy as np
import pandas as pd
df = pd.read_csv("./data/iris.data", header=None)
from sklearn.model_selection import train_test_split
df_train, df_test = train_test_split(df, test_size=0.3, random_state=52)
X_train = df_train.drop(4, axis=1)
y_train = df_train[4]
X_test = df_test.drop(4, axis=1)
y_test = df_test[4]
Using scikit-learn:
from sklearn.linear_model import LogisticRegression
scikit_model = LogisticRegression(multi_class='multinomial', solver ='saga', max_iter=500)
scikit_model.fit(X_train, y_train)
the average weighted f1-score on test set:
y_test_pred = scikit_model.predict(X_test)
from sklearn.metrics import classification_report
print(classification_report(y_test, y_test_pred, scikit_model.classes_))
is 0.96.
Then with Keras:
from sklearn.preprocessing import LabelEncoder
from keras.utils import np_utils
# first we have to encode class values as integers
encoder = LabelEncoder()
encoder.fit(y_train)
y_train_encoded = encoder.transform(y_train)
Y_train = np_utils.to_categorical(y_train_encoded)
y_test_encoded = encoder.transform(y_test)
Y_test = np_utils.to_categorical(y_test_encoded)
from tensorflow import keras
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Activation
from keras.regularizers import l2
#model construction
input_dim = 4 # 4 variables
output_dim = 3 # 3 possible outputs
def classification_model():
model = Sequential()
model.add(Dense(output_dim, input_dim=input_dim, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='sgd', metrics=['accuracy'])
return model
#training
keras_model = classification_model()
keras_model.fit(X_train, Y_train, epochs=500, verbose=0)
the average weighted f1-score on test set:
classes = np.argmax(keras_model.predict(X_test), axis = 1)
y_test_pred = encoder.inverse_transform(classes)
from sklearn.metrics import classification_report
print(classification_report(y_test, y_test_pred, encoder.classes_))
is 0.89.
Is it possible to perform identical (or at least as much as possible) logistic regression with Keras as with scikit-learn?
I tried to run your examples and noticed a couple of potential sources:
The test set is incredibly small, only 45 instances. This means that to get from accuracy of .89 to .96, the model only needs to predict just three more instances correctly. Due to randomness in training, your Keras results can oscillate quite a bit.
As explained by #meowongac https://stackoverflow.com/a/59643522/1467943, you're using a different optimizer. One point is that scikit's algorithm will automatically set its learning rate. For SGD in Keras, tweaking learning rate and/or number of epochs could lead to improvements.
Scikit learn quietly uses L2 regularization by default.
Using your code, I was able to get accuracy ranging from .89 to .96 by running SGD with learning rate set to .05. When switching to Adam (also with this quite high learning rate), I got more stable results ranging from .92 to .96 (although this is more of an impression as I didn't run too many trials).
One obvious difference is saga (a variant of SAG) is used in LogisticRegression while SGD is used in your NN. As far as I know, LogisticRegression doesn't support SGD. Alternatively you can use SGDRegressor or SGDClassifier instead of LogisticRegression. And here is a blog discussing the differences between them.
I have created a single layer neural network with two outputs (one for each class, 0 or 1) trained with the sigmoid method and SGD optimizer. I have also trained the NN without any hidden layer. Furthermore I have validated the performance of the model using StratifiedKFold with 4 splits. The model trained is designed with a lr=0.1 and epochs=150, however, I don´t know if these values are optimising the model. For this reason, I would like to run 20 combinations of the learning rate parameter and epochs to see the most accurate result and for which combination of these parameters I´m obtaining it. Below the restrictions:
epochs: values between 10 and 150
learning rate: values between 0.01 and 1
Please, see below the code:
from sklearn.model_selection import StratifiedKFold
from keras.models import Sequential
from keras.layers.core import Dense
from keras import layers
from keras.optimizers import SGD
from keras.wrappers.scikit_learn import KerasClassifier
from sklearn.model_selection import cross_val_score
#Function to create the NN model
def create_model():
#Neural Network model
ann = Sequential()
#Number of columns of training dataset
n_cols = x_train.shape[1]
#Output
ann.add(Dense(units=1,activation='sigmoid',input_shape=(n_cols,)))
#SGD Optimizer
sgd = SGD(lr=0.1)
#Compile with SGD optimizer and binary_crossentropy
ann.compile(optimizer=sgd,
loss='binary_crossentropy',
metrics=['accuracy'])
return ann
#Creating the model
model=KerasClassifier(build_fn=create_model,epochs=150,batch_size=10,verbose=0)
#Evaluating the model using StratifiedKFold
kfold = StratifiedKFold(n_splits=4, shuffle=True, random_state=2)
results=cross_val_score(model,x_train,y_train,cv=kfold)
#Accuracies
print(results)
To create the 20 combinations formed by the learning rate and epochs, firstly, I have created random values of lr and epochs:
#Epochs
epo = np.random.randint(10,150)
#Learning Rate
learn = np.random.randint(0.01,1)
My problem is that I don´t know how to fit this into the code of the NN in order to find which is the combination that gives the best accuracy of the model.
there is no need to optimize the number of epochs you could easily use early stop which will stop when there is no improvement in your loss or accuracy
so just set your epochs a big number (for example 300 ) and add:
keras.callbacks.callbacks.EarlyStopping(monitor='val_loss', min_delta=0.1)
you also can call the best weights (just before the model started to overfit) by:
restore_best_weights=True
First in the create_model() function you defined the optimizer you passed the learning rate as parameter to:
#SGD Optimizer
sgd = SGD(lr=0.1)
This is the starting learning rate of the optimization process, from this point the optimizer handles the optimal learning rate.
Nevertheless you can pass several starting learning rate inside a loop calling repeatedly the create_model() function and passsing a learning rate parameter to that.
Furthermore as parsa mentioned choosing the right epoch number is based on the validation result, which shows where your model became overfitted. There is the point is, where the epoch number reached its optimum.