This is a regression problem. Below is my code
import numpy as np
import pandas as pd
from keras.models import Sequential
from keras.layers import Dense
from keras.wrappers.scikit_learn import KerasRegressor
from sklearn.cross_validation import cross_val_score, KFold
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
os.chdir(r'C:\Users\Swapnil\Desktop\RP TD\first\Changes')
## Load the dataset
dataset1 = pd.read_csv("Main Lane Plaza 1.csv")
X_train = dataset1.iloc[:,0:11].values
Y_train = dataset1.iloc[:,11].values
dataset2 = pd.read_csv("Main Lane Plaza 1_070416010117.csv")
X_test = dataset2.iloc[:,0:11].values
Y_test = dataset2.iloc[:,11].values
##Define base model
def base_model():
model = Sequential()
model.add(Dense(11, input_dim=11, kernel_initializer='normal',
activation='sigmoid'))
model.add(Dense(7, kernel_initializer='normal', activation='sigmoid'))
model.add(Dense(1, kernel_initializer='normal'))
model.compile(loss='mean_squared_error', optimizer = 'adam')
return model
seed = 7
np.random.seed(seed)
clf = KerasRegressor(build_fn=base_model, nb_epoch=100,
batch_size=5,verbose=0)
clf.fit(X_train, Y_train)
res = clf.predict(X_train)
##Result
clf.score(X_test, Y_test)
Not sure if the score should be negative??
Kindly advise if i am doing something wrong.
Thanks in advance.
I am not able to figure it out can this be problem due to feature scaling as I did feature scaling using R and saved the csv files to use in python.
When you get a negative score for regression problem, it usually means that your the model you choose can't fit your data well.
You have layer 1 activation as sigmoid, layer 2 also as sigmoid and then final layer as 1 output.
change the activations to relu, as sigmoid would be squashing the values between 0 to 1. Making the numbers really small, causing the vanishing gradient problem over the 2 hidden layer.
def base_model():
model = Sequential()
model.add(Dense(11, input_dim=11, kernel_initializer='normal', activation='relu'))
model.add(Dense(7, kernel_initializer='normal', activation='relu'))
model.add(Dense(1, kernel_initializer='normal'))
model.compile(loss='mean_squared_error', optimizer='adam')
return model
Related
I have this code:
from numpy import loadtxt
from keras.models import Sequential
from keras.layers import Dense
dataset = loadtxt('pima-indians-diabetes.csv', delimiter=',')
X = dataset[:,0:8]
y = dataset[:,8]
model = Sequential()
model.add(Dense(12, input_dim=8, activation='relu'))
model.add(Dense(8, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(X, y, epochs=150, batch_size=10)
_, accuracy = model.evaluate(X, y)
print('Accuracy: %.2f' % (accuracy*100))
I need to change the output column so it predicts/learns from a score(for instance 1 to a million) instead of 0 or 1(sigmoid).
As for your case you need to use relu as your activation function in the last layer (output layer) instead of sigmoid
The range of relu is [0,inf).Then in that case you need to use 'MSE' as your loss metric.
Conceptually, the problem which you are trying to solve is a regression type of problem.
I am unsure how to interpret the default behavior of Keras in the following situation:
My Y (ground truth) was set up using scikit-learn's MultilabelBinarizer().
Therefore, to give a random example, one row of my y column is one-hot encoded as such:
[0,0,0,1,0,1,0,0,0,0,1].
So I have 11 classes that could be predicted, and more than one can be true; hence the multilabel nature of the problem. There are three labels for this particular sample.
I train the model as I would for a non multilabel problem (business as usual) and I get no errors.
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation
from keras.optimizers import SGD
model = Sequential()
model.add(Dense(5000, activation='relu', input_dim=X_train.shape[1]))
model.add(Dropout(0.1))
model.add(Dense(600, activation='relu'))
model.add(Dropout(0.1))
model.add(Dense(y_train.shape[1], activation='softmax'))
sgd = SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='categorical_crossentropy',
optimizer=sgd,
metrics=['accuracy',])
model.fit(X_train, y_train,epochs=5,batch_size=2000)
score = model.evaluate(X_test, y_test, batch_size=2000)
score
What does Keras do when it encounters my y_train and sees that it is "multi" one-hot encoded, meaning there is more than one 'one' present in each row of y_train? Basically, does Keras automatically perform multilabel classification? Any differences in the interpretation of the scoring metrics?
In short
Don't use softmax.
Use sigmoid for activation of your output layer.
Use binary_crossentropy for loss function.
Use predict for evaluation.
Why
In softmax when increasing score for one label, all others are lowered (it's a probability distribution). You don't want that when you have multiple labels.
Complete Code
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, Activation
from tensorflow.keras.optimizers import SGD
model = Sequential()
model.add(Dense(5000, activation='relu', input_dim=X_train.shape[1]))
model.add(Dropout(0.1))
model.add(Dense(600, activation='relu'))
model.add(Dropout(0.1))
model.add(Dense(y_train.shape[1], activation='sigmoid'))
sgd = SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='binary_crossentropy',
optimizer=sgd)
model.fit(X_train, y_train, epochs=5, batch_size=2000)
preds = model.predict(X_test)
preds[preds>=0.5] = 1
preds[preds<0.5] = 0
# score = compare preds and y_test
Answer from Keras Documentation
I am quoting from keras document itself.
They have used output layer as dense layer with sigmoid activation. Means they also treat multi-label classification as multi-binary classification with binary cross entropy loss
Following is model created in Keras documentation
shallow_mlp_model = keras.Sequential(
[
layers.Dense(512, activation="relu"),
layers.Dense(256, activation="relu"),
layers.Dense(lookup.vocabulary_size(), activation="sigmoid"),
] # More on why "sigmoid" has been used here in a moment.
Keras doc link::
https://keras.io/examples/nlp/multi_label_classification/
I understand the Vanishing and exploding gradients problem in Vanilla RNNs and why this happens. However, I would like to create this problem purposefully to understand in a better way. I have taken a below code from https://www.datatechnotes.com/2018/12/rnn-example-with-keras-simplernn-in.html.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from keras.models import Sequential
from keras.layers import Dense, SimpleRNN
# convert into dataset matrix
def convertToMatrix(data, step):
X, Y =[], []
for i in range(len(data)-step):
d=i+step
X.append(data[i:d,])
Y.append(data[d,])
return np.array(X), np.array(Y)
step = 4
N = 1000
Tp = 800
t=np.arange(0,N)
x=np.sin(0.02*t)+2*np.random.rand(N)
df = pd.DataFrame(x)
df.head()
plt.plot(df)
plt.show()
values=df.values
train,test = values[0:Tp,:], values[Tp:N,:]
# add step elements into train and test
test = np.append(test,np.repeat(test[-1,],step))
train = np.append(train,np.repeat(train[-1,],step))
trainX,trainY =convertToMatrix(train,step)
testX,testY =convertToMatrix(test,step)
trainX = np.reshape(trainX, (trainX.shape[0], 1, trainX.shape[1]))
testX = np.reshape(testX, (testX.shape[0], 1, testX.shape[1]))
model = Sequential()
model.add(SimpleRNN(units=32, input_shape=(1,step), activation="relu"))
model.add(Dense(8, activation="relu"))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='rmsprop')
model.summary()
model.fit(trainX,trainY, epochs=100, batch_size=16, verbose=2)
trainPredict = model.predict(trainX)
testPredict= model.predict(testX)
predicted=np.concatenate((trainPredict,testPredict),axis=0)
trainScore = model.evaluate(trainX, trainY, verbose=0)
print(trainScore)
How should I modify this code to create this problem? Thank you.
Vanishing gradient is the problem when we use the sigmoid activation function. If you change relu to sigmoid, you may encounter vanishing gradient problem.
model = Sequential()
model.add(SimpleRNN(units=32, input_shape=(1,step), activation="sigmoid"))
model.add(Dense(8, activation="sigmoid"))
I followed a tutorial on youtube and I accidentally didn't add model.add(Dense(6, activation='relu')) on Keras and I got 36% accuracy. After I added this code it rised to 86%. Why did this happen?
This is the code
from sklearn.model_selection import train_test_split
import keras
from keras.models import Sequential
from keras.layers import Dense
import numpy as np
np.random.seed(3)
classifications = 3
dataset = np.loadtxt('wine.csv', delimiter=",")
X = dataset[:,1:14]
Y = dataset[:,0:1]
x_train, x_test, y_train, y_test = train_test_split(X, Y, test_size=0.66,
random_state=5)
y_train = keras.utils.to_categorical(y_train-1, classifications)
y_test = keras.utils.to_categorical(y_test-1, classifications)
model = Sequential()
model.add(Dense(10, input_dim=13, activation='relu'))
model.add(Dense(8, activation='relu'))
model.add(Dense(6, activation='relu')) # This is the code I missed
model.add(Dense(6, activation='relu'))
model.add(Dense(4, activation='relu'))
model.add(Dense(2, activation='relu'))
model.add(Dense(classifications, activation='softmax'))
model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=
['accuracy'])
model.fit(x_train, y_train, batch_size=15, epochs=2500, validation_data=
(x_test, y_test))
Number of layers is an hyper parameter just like learning rate,no of neurons.
These play an important role in determining the accuracy.
So in your case.
model.add(Dense(6, activation='relu'))
This layer played the key roll.
We cannot understand what exactly these layers are actually doing.
The best we can do is to do hyper parameter tuning to get the best combination of hyper parameters.
In my opinion, maybe it's the ratio of your training set to your test set. You have 66% of your test set, so it's possible that training with this model will be under fitting. So one less layer of dense will have a greater change in the accuracy . You put test_size = 0.2 and try again the change in the accuracy of the missing layer.
I am unsure how to interpret the default behavior of Keras in the following situation:
My Y (ground truth) was set up using scikit-learn's MultilabelBinarizer().
Therefore, to give a random example, one row of my y column is one-hot encoded as such:
[0,0,0,1,0,1,0,0,0,0,1].
So I have 11 classes that could be predicted, and more than one can be true; hence the multilabel nature of the problem. There are three labels for this particular sample.
I train the model as I would for a non multilabel problem (business as usual) and I get no errors.
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation
from keras.optimizers import SGD
model = Sequential()
model.add(Dense(5000, activation='relu', input_dim=X_train.shape[1]))
model.add(Dropout(0.1))
model.add(Dense(600, activation='relu'))
model.add(Dropout(0.1))
model.add(Dense(y_train.shape[1], activation='softmax'))
sgd = SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='categorical_crossentropy',
optimizer=sgd,
metrics=['accuracy',])
model.fit(X_train, y_train,epochs=5,batch_size=2000)
score = model.evaluate(X_test, y_test, batch_size=2000)
score
What does Keras do when it encounters my y_train and sees that it is "multi" one-hot encoded, meaning there is more than one 'one' present in each row of y_train? Basically, does Keras automatically perform multilabel classification? Any differences in the interpretation of the scoring metrics?
In short
Don't use softmax.
Use sigmoid for activation of your output layer.
Use binary_crossentropy for loss function.
Use predict for evaluation.
Why
In softmax when increasing score for one label, all others are lowered (it's a probability distribution). You don't want that when you have multiple labels.
Complete Code
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, Activation
from tensorflow.keras.optimizers import SGD
model = Sequential()
model.add(Dense(5000, activation='relu', input_dim=X_train.shape[1]))
model.add(Dropout(0.1))
model.add(Dense(600, activation='relu'))
model.add(Dropout(0.1))
model.add(Dense(y_train.shape[1], activation='sigmoid'))
sgd = SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='binary_crossentropy',
optimizer=sgd)
model.fit(X_train, y_train, epochs=5, batch_size=2000)
preds = model.predict(X_test)
preds[preds>=0.5] = 1
preds[preds<0.5] = 0
# score = compare preds and y_test
Answer from Keras Documentation
I am quoting from keras document itself.
They have used output layer as dense layer with sigmoid activation. Means they also treat multi-label classification as multi-binary classification with binary cross entropy loss
Following is model created in Keras documentation
shallow_mlp_model = keras.Sequential(
[
layers.Dense(512, activation="relu"),
layers.Dense(256, activation="relu"),
layers.Dense(lookup.vocabulary_size(), activation="sigmoid"),
] # More on why "sigmoid" has been used here in a moment.
Keras doc link::
https://keras.io/examples/nlp/multi_label_classification/