I am trying to approximate function with keras model, that has only one hidden layer and whatever I do - I can't reach necessary result.
I'm trying to do it with following code
from __future__ import print_function
from keras.models import Sequential
from keras.layers import Dense
from keras.optimizers import Adam
from LABS.ZeroLab import E_Function as dataset5
train_size = 2000
# 2 model and data initializing
(x_train, y_train), (x_test, y_test) = dataset5.load_data(train_size=train_size, show=True)
model = Sequential()
model.add(Dense(50, kernel_initializer='he_uniform', bias_initializer='he_uniform', activation='sigmoid'))
model.add(Dense(1, kernel_initializer='he_uniform', bias_initializer='he_uniform', activation='linear'))
model.compile(optimizer=Adam(), loss='mae', metrics=['mae'])
history = model.fit(x=x_train, y=y_train, batch_size=20, epochs=10000, validation_data=(x_test, y_test), verbose=1)
It's function that loads from dataset5
It's comparison of model prediction with testing data
I tryied to fit this network with different optimizers and neurons number (from 50 to 300), but result was the same.
What should I change?
I found solution! The main issue was train data. I forgot to shuffle x_train and y_train before fitting.
I approximated it with 2 hidden layers succesfully, but I still can't to approximate it with 1 hidden layer.
Related
I have trained a LSTM model to predict multiple output value.
Predicted values are almost same even though the loss is less. Why is it so? How can I improve it?
`from keras import backend as K
import math
from sklearn.metrics import mean_squared_error, mean_absolute_error
from keras.layers.core import Dense, Dropout, Activation
def create_model():
model = Sequential()
model.add(LSTM(50, return_sequences=True, input_shape=(40000, 7)))
model.add(LSTM(50, return_sequences= True))
model.add(LSTM(50, return_sequences= False))
model.add(Dense(25))
model.add(Dense(2, activation='linear'))
model.compile(optimizer='adam', loss='mean_squared_error')
model.summary()
return model
model = create_model()
model.fit(X_train, Y_train, shuffle=False, verbose=1, epochs=10)
prediction = model.predict(X_test, verbose=0)
print(prediction)
prediction =
[[0.26766795 0.00193274]
[0.2676593 0.00192017]
[0.2676627 0.00193239]
[0.2676644 0.00192784]
[0.26766634 0.00193461]
[0.2676624 0.00192487]
[0.26766685 0.00193129]
[0.26766685 0.00193165]
[0.2676621 0.00193216]
[0.26766127 0.00192624]]
`
calculate mean_relative error
`mean_relative_error = tf.reduce_mean(tf.abs((Y_test-prediction)/Y_test))
print(mean_relative_error)`
`mean_relative_error= 1.9220362`
It means you are just closing the values of x as nearest to y. Just like mapping x -> y. The Relative Error is saying to me that your y's are relatively small and when you are taking the mean difference between y_hat and y they are close enough...
To Break this symmetry you should increase the number of LSTM Cells and add a Dropout to it, also make sure to put an L1-Regularization term into your Dense Layers.
Decrease the number of neurons from each Dense Layer and increase the network size, also change your loss from "mean_squared_error" to "mean_absolute_error".
One more thing use Adagrad with a learning_rate of 1, instead of Adam Optimizer.
I have a classified network for the MNIST dataset (csv) with 10 labels which are numbers (0,1,2,3,4,5,6,7,8,9) and after training the network, I run the predict_classes for test_data. I want to know for each of the data in test_set what is the score of every label(0,1,2,3,4,5,6,7,8,9) in y_pred.
for example if predict_classes say that for first data the label is "7" what is the score of 7 and what is the scores of other labels such (0,1,2,3,4,5,6,8,9)
How can I write its code?
from keras import models
import numpy as np
from keras import layers
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, LSTM, BatchNormalization
mnist = tf.keras.datasets.mnist
#Load dataset
(x_train, y_train), (x_test, y_test) = mnist.load_data()
#normalizing data
x_train = x_train / 255.0
x_test = x_test / 255.0
# bulid model
model = Sequential()
model.add(LSTM(15, input_shape=(x_train.shape[1:]), return_sequences=True, activation="tanh", unroll=True))
model.add(LSTM(15, dropout=0.2, activation="tanh", unroll=True))
#model.add(LSTM(1, activation='tanh'))
#model.add(LSTM(2, activation='tanh'))
model.add(Dense(5, activation='tanh' ))
model.add(Dense(10, activation='sigmoid'))
model.summary()
opt = tf.keras.optimizers.Adam(lr=1e-3, decay=1e-5)
model.compile(loss='sparse_categorical_crossentropy', optimizer=opt,
metrics=['accuracy'])
model.fit(x_train, y_train, epochs=2, validation_data=(x_test, y_test))
y_pred = model.predict_classes(x_test)
print(y_pred)
Instead of using model.predict_classes(), you can use model.predict() (https://www.tensorflow.org/api_docs/python/tf/keras/Model#predict).
This returns an array with the probability for all of the possible classes.
I am unsure how to interpret the default behavior of Keras in the following situation:
My Y (ground truth) was set up using scikit-learn's MultilabelBinarizer().
Therefore, to give a random example, one row of my y column is one-hot encoded as such:
[0,0,0,1,0,1,0,0,0,0,1].
So I have 11 classes that could be predicted, and more than one can be true; hence the multilabel nature of the problem. There are three labels for this particular sample.
I train the model as I would for a non multilabel problem (business as usual) and I get no errors.
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation
from keras.optimizers import SGD
model = Sequential()
model.add(Dense(5000, activation='relu', input_dim=X_train.shape[1]))
model.add(Dropout(0.1))
model.add(Dense(600, activation='relu'))
model.add(Dropout(0.1))
model.add(Dense(y_train.shape[1], activation='softmax'))
sgd = SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='categorical_crossentropy',
optimizer=sgd,
metrics=['accuracy',])
model.fit(X_train, y_train,epochs=5,batch_size=2000)
score = model.evaluate(X_test, y_test, batch_size=2000)
score
What does Keras do when it encounters my y_train and sees that it is "multi" one-hot encoded, meaning there is more than one 'one' present in each row of y_train? Basically, does Keras automatically perform multilabel classification? Any differences in the interpretation of the scoring metrics?
In short
Don't use softmax.
Use sigmoid for activation of your output layer.
Use binary_crossentropy for loss function.
Use predict for evaluation.
Why
In softmax when increasing score for one label, all others are lowered (it's a probability distribution). You don't want that when you have multiple labels.
Complete Code
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, Activation
from tensorflow.keras.optimizers import SGD
model = Sequential()
model.add(Dense(5000, activation='relu', input_dim=X_train.shape[1]))
model.add(Dropout(0.1))
model.add(Dense(600, activation='relu'))
model.add(Dropout(0.1))
model.add(Dense(y_train.shape[1], activation='sigmoid'))
sgd = SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='binary_crossentropy',
optimizer=sgd)
model.fit(X_train, y_train, epochs=5, batch_size=2000)
preds = model.predict(X_test)
preds[preds>=0.5] = 1
preds[preds<0.5] = 0
# score = compare preds and y_test
Answer from Keras Documentation
I am quoting from keras document itself.
They have used output layer as dense layer with sigmoid activation. Means they also treat multi-label classification as multi-binary classification with binary cross entropy loss
Following is model created in Keras documentation
shallow_mlp_model = keras.Sequential(
[
layers.Dense(512, activation="relu"),
layers.Dense(256, activation="relu"),
layers.Dense(lookup.vocabulary_size(), activation="sigmoid"),
] # More on why "sigmoid" has been used here in a moment.
Keras doc link::
https://keras.io/examples/nlp/multi_label_classification/
I followed a tutorial on youtube and I accidentally didn't add model.add(Dense(6, activation='relu')) on Keras and I got 36% accuracy. After I added this code it rised to 86%. Why did this happen?
This is the code
from sklearn.model_selection import train_test_split
import keras
from keras.models import Sequential
from keras.layers import Dense
import numpy as np
np.random.seed(3)
classifications = 3
dataset = np.loadtxt('wine.csv', delimiter=",")
X = dataset[:,1:14]
Y = dataset[:,0:1]
x_train, x_test, y_train, y_test = train_test_split(X, Y, test_size=0.66,
random_state=5)
y_train = keras.utils.to_categorical(y_train-1, classifications)
y_test = keras.utils.to_categorical(y_test-1, classifications)
model = Sequential()
model.add(Dense(10, input_dim=13, activation='relu'))
model.add(Dense(8, activation='relu'))
model.add(Dense(6, activation='relu')) # This is the code I missed
model.add(Dense(6, activation='relu'))
model.add(Dense(4, activation='relu'))
model.add(Dense(2, activation='relu'))
model.add(Dense(classifications, activation='softmax'))
model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=
['accuracy'])
model.fit(x_train, y_train, batch_size=15, epochs=2500, validation_data=
(x_test, y_test))
Number of layers is an hyper parameter just like learning rate,no of neurons.
These play an important role in determining the accuracy.
So in your case.
model.add(Dense(6, activation='relu'))
This layer played the key roll.
We cannot understand what exactly these layers are actually doing.
The best we can do is to do hyper parameter tuning to get the best combination of hyper parameters.
In my opinion, maybe it's the ratio of your training set to your test set. You have 66% of your test set, so it's possible that training with this model will be under fitting. So one less layer of dense will have a greater change in the accuracy . You put test_size = 0.2 and try again the change in the accuracy of the missing layer.
I am unsure how to interpret the default behavior of Keras in the following situation:
My Y (ground truth) was set up using scikit-learn's MultilabelBinarizer().
Therefore, to give a random example, one row of my y column is one-hot encoded as such:
[0,0,0,1,0,1,0,0,0,0,1].
So I have 11 classes that could be predicted, and more than one can be true; hence the multilabel nature of the problem. There are three labels for this particular sample.
I train the model as I would for a non multilabel problem (business as usual) and I get no errors.
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation
from keras.optimizers import SGD
model = Sequential()
model.add(Dense(5000, activation='relu', input_dim=X_train.shape[1]))
model.add(Dropout(0.1))
model.add(Dense(600, activation='relu'))
model.add(Dropout(0.1))
model.add(Dense(y_train.shape[1], activation='softmax'))
sgd = SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='categorical_crossentropy',
optimizer=sgd,
metrics=['accuracy',])
model.fit(X_train, y_train,epochs=5,batch_size=2000)
score = model.evaluate(X_test, y_test, batch_size=2000)
score
What does Keras do when it encounters my y_train and sees that it is "multi" one-hot encoded, meaning there is more than one 'one' present in each row of y_train? Basically, does Keras automatically perform multilabel classification? Any differences in the interpretation of the scoring metrics?
In short
Don't use softmax.
Use sigmoid for activation of your output layer.
Use binary_crossentropy for loss function.
Use predict for evaluation.
Why
In softmax when increasing score for one label, all others are lowered (it's a probability distribution). You don't want that when you have multiple labels.
Complete Code
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, Activation
from tensorflow.keras.optimizers import SGD
model = Sequential()
model.add(Dense(5000, activation='relu', input_dim=X_train.shape[1]))
model.add(Dropout(0.1))
model.add(Dense(600, activation='relu'))
model.add(Dropout(0.1))
model.add(Dense(y_train.shape[1], activation='sigmoid'))
sgd = SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='binary_crossentropy',
optimizer=sgd)
model.fit(X_train, y_train, epochs=5, batch_size=2000)
preds = model.predict(X_test)
preds[preds>=0.5] = 1
preds[preds<0.5] = 0
# score = compare preds and y_test
Answer from Keras Documentation
I am quoting from keras document itself.
They have used output layer as dense layer with sigmoid activation. Means they also treat multi-label classification as multi-binary classification with binary cross entropy loss
Following is model created in Keras documentation
shallow_mlp_model = keras.Sequential(
[
layers.Dense(512, activation="relu"),
layers.Dense(256, activation="relu"),
layers.Dense(lookup.vocabulary_size(), activation="sigmoid"),
] # More on why "sigmoid" has been used here in a moment.
Keras doc link::
https://keras.io/examples/nlp/multi_label_classification/