making prediction on the entire testset - python

I want to make predictions on the entire test set, here the test set is only 20% of datasetA, I understand that this is because its only for training purposes, when I save the weights and then make predictions on another datasetB, will it also split the test-set datasetB.
How can I make predictions on the entire test-set datasetB using the weights of datasetA that it was trained on.
Thanks.
x = dataset.iloc[:, :-1].values
# Dependent Variable:
y = dataset.iloc[:, -1].values
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
# Initialising the ANN
classifier = Sequential()
# Adding the input layer and the first hidden layer
classifier.add(Dense(units = 27, kernel_initializer = 'uniform', activation = 'relu', input_dim = 6))
# Adding the second hidden layer
classifier.add(Dense(units = 27, kernel_initializer = 'uniform', activation = 'relu'))
# Adding the output layer
classifier.add(Dense(units = 1, kernel_initializer = 'uniform', activation = 'sigmoid'))
# Compiling the ANN
classifier.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])
# Fitting the ANN to the Training set
classifier.fit(X_train, y_train, batch_size = 10, epochs = 20)
#making predictions on test data
classifier.predict(X_test)

If I am understanding correctly, you want to use your trained model on a completely new dataset?
Keras provides several ways to do this, but I think the most common one would be to export your trained model into a .hd5 file using the command
model.save("filepath/model.hd5")
Now you can load in and use your model to wherever you want using the commands
model = model.load("filepath/model.hd5")
score = model.evaluate(X, Y)
where X is the feature columns of Dataset B and Y is the response to get your scoring. If dataset B is in the same instance, you can always just use
model.predict(X)
Where X is now the feature columns of dataset B

From what I understand you are asking 2 questions here:
First, the splitting of "dataset B" into a train and test set is done manually by you in the line
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0).
If, when you use your "dataset B", you want to test your classifier on ALL the data points of "dataset B", you do not have to do this train test split, and can simply pass the X values of "dataset B" to your classifier.
As for how to do this, as per your second question, it is the same as what you have already done with "dataset A"'s test set:
classifier.predict(X) will make predictions using the fit it already learned on "dataset A", assuming you do not recompile or call .fit() again.

Related

How to properly shape data to use with LSTM model?

I am working with an LSTM project for learning purposes where I am using time-series data that has 3 columns [current, sma, target] where sma is the simple moving average; I extracted these values from the dataframe like so
data = df[['current', 'sma', 'target']].values
# normlize data
scaler = MinMaxScaler(feature_range=(0,1))
dataset = scaler.fit_transform(data)
# then split inputs from targets
X = dataset[:, :2]
y = dataset[:, 2]
# split into xtrain ytrain xtest ytest
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
Everything works fine so far, and I understand, but the uncharted territory for me would be to convert the x_*, y_* arrays into a 3-d arrays to feed the model; I am using a simple model just to make this work, I am not looking for impressive results, this is purely educational.
The model that I will use:
model = tf.keras.Sequential()
model.add(tf.keras.layers.LSTM(128, input_shape=(timesteps, features), return_sequences=True))
model.add(tf.keras.layers.LSTM(64, return_sequences=False))
model.add(tf.keras.layers.Dense(features))
model.compile(loss='mean_squared_error', optimizer='adam')
How to reshape the data to feed it to the model?

Are these 2 keras deep learning code the same for multiple outputs?

I've a problem involving airfoil velocity and pressure prediction, given the AOA,x,y. I'm using keras with MLP. I have 3 inputs (AOA,x,y) and I have to predict 3 outputs (u,v,p). I initially have a code which outputs the MSE loss as a single value. However, I modified the code so that I have MSE for each output. However, I don't get the avg MSE of the 3 outputs (u_mean_squared_error: 73.63%,v_mean_squared_error: 1.13%,p_mean_squared_error: 2.16%) equal to the earlier single MSE loss (mean_squared_error: 5.81%). Hence, I'm wondering if my new code is wrong. Or whether I'm doing it the right way. Can someone help?
Old code:
# load pima indians dataset
dataset = numpy.loadtxt("S1020_data.csv", delimiter=",")
# split into input and output variables
X = dataset[:,0:3]
Y = dataset[:,3:6]
# split into 67% for train and 33% for test
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.33, random_state=seed)
# create model
input_data = layers.Input(shape=(3,))
#create the layers and pass them the input tensor to get the output tensor:
hidden1Out = Dense(units=12, activation='relu')(input_data)
hidden2Out = Dense(units=8, activation='relu')(hidden1Out)
finalOut = Dense(units=3, activation='relu')(hidden2Out)
#define the model's start and end points
model = Model(input_data, finalOut)
# Compile model
model.compile(loss='mean_squared_error', optimizer='adam', metrics=['mean_squared_error'])
# Fit the model
model.fit(X_train, y_train, validation_data=(X_test,y_test), epochs=10, batch_size=1000)
# evaluate the model
scores = model.evaluate(X, Y)
print("\n%s: %.2f%%" % (model.metrics_names[1], scores[1]*100))
New code:
# load pima indians dataset
dataset = numpy.loadtxt("S1020_data.csv", delimiter=",")
# split into input and output variables
X = dataset[:,0:3]
Y = dataset[:,3:6]
# split into 67% for train and 33% for test
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.33, random_state=seed)
# create model
input_data = layers.Input(shape=(3,))
#create the layers and pass them the input tensor to get the output tensor:
hidden1Out = Dense(units=12, activation='relu')(input_data)
hidden2Out = Dense(units=8, activation='relu')(hidden1Out)
u_out = Dense(1, activation='relu', name='u')(hidden2Out)
v_out = Dense(1, activation='relu', name='v')(hidden2Out)
p_out = Dense(1, activation='relu', name='p')(hidden2Out)
#define the model's start and end points
model = Model(input_data,outputs = [u_out, v_out, p_out])
# Compile model
model.compile(loss='mean_squared_error', optimizer='adam', metrics=['mean_squared_error'])
# Fit the model
model.fit(X_train, [y_train[:,0], y_train[:,1], y_train[:,2]], validation_data=(X_test,[y_test[:,0], y_test[:,1], y_test[:,2]]), epochs=10, batch_size=1000)
# evaluate the model
scores = model.evaluate(X, [Y[:,0], Y[:,1], Y[:,2]])
for i in range(7):
print("\n%s: %.2f%%" % (model.metrics_names[i], scores[i]*100))
I think the difference comes from the optimization objective.
In your old code, the objective was:
sqrt( (u_true - u_pred)^2 + (v_true - v_pred)^2 + (p_true - p_pred)^2 )
which minimizes the 2-norm of the [u_pred,v_pred,p_pred] vector with respect to its target.
But in the new one, the objective became:
sqrt((u_true - u_pred)^2) + sqrt((v_true - v_pred)^2) + sqrt((p_true - p_pred)^2)
which is quite different from the previous one.

How can ı get probability values for each class with predict method on an A.N.N model on Keras

I'm new on the deep learning subjects, i need help for getting individual probabilities for each class on a Keras artificial neural network(A.N.N.) model.I have an exoplanet catalog dataset from PHL and i'm trying to make predictions according to whether planet is habitable, maybe habitable or not habitable.For now i have tried A.N.N. with some important columns like
dataToLearn = data[["P_DISTANCE","S_HZ_OPT_MIN", "S_HZ_OPT_MAX", "S_HZ_CON_MIN", "S_HZ_CON_MAX", "P_TYPE", "P_ESI", "P_HABITABLE"]]
class_names = list(dataToLearn.columns)
ı got rid of some 'nan' values with,
dataToLearn = dataToLearn.dropna(how='all')
dataToLearn = dataToLearn.dropna(subset=['P_TYPE', 'P_ESI'])
then preprocessed the data,
labelencoder_pType = LabelEncoder()
dataToLearn["P_TYPE"] = labelencoder_pType.fit_transform(dataToLearn["P_TYPE"])
onehotencoder = ColumnTransformer([("P_TYPE", OneHotEncoder(),[5])], remainder = "passthrough")
dataToLearn = onehotencoder.fit_transform(dataToLearn)
#Dummy Variable Trap
dataToLearn = dataToLearn[:,1:]
dataToLearn = pd.DataFrame(dataToLearn)
X = dataToLearn.iloc[:,:10].values
Y = dataToLearn.iloc[:,10].values
Y = pd.get_dummies(Y).values
x_train, x_test, y_train, y_test = train_test_split(X,Y,test_size = 0.35)
y_test = y_test.astype(np.float64)
y_train = y_train.astype(np.float64)
sc_X = ColumnTransformer([("",StandardScaler(),slice(0,10))])
x_train = sc_X.fit_transform(x_train)
x_test = sc_X.transform(x_test)
as you can see ı have hot-encoded the output(Y) values but i'm not sure do i need to do that in multiclass problems.On next step i built the classifier like below.
def build_classifier():
classifier = Sequential() # initialize neural network
classifier.add(Dense(units = 10, kernel_initializer = 'uniform', activation = 'relu', input_dim = x_train.shape[1]))
classifier.add(Dense(units = 8, kernel_initializer = 'uniform', activation = 'relu'))
classifier.add(Dropout(0.3))
classifier.add(Dense(units = 8, kernel_initializer = 'uniform', activation = 'relu'))
classifier.add(Dropout(0.3))
classifier.add(Dense(units = 3, kernel_initializer = 'uniform', activation = 'softmax'))
classifier.compile(optimizer = 'RMSprop', loss = 'categorical_crossentropy', metrics = ['accuracy'])
return classifier
classifier = KerasClassifier(build_fn = build_classifier, batch_size = 32, epochs = 150)
accuracies = cross_val_score(estimator = classifier, X = x_train, y = y_train, cv = 10, n_jobs = -1)
accuracyMean = accuracies.mean()
classifier.fit(x_train, y_train)
Then predicted the x_test with
y_pred = classifier.predict(x_test)
The problem is i cannot get predicted array(y_pred) with the same dimension as y_test which is one-hot encoded according to three possibilities.In y_pred i'm always getting 0(not habitable) or 2(may habitable) results and never 1(habitable) but in one column shape, i think the reason of the model's failure on predicting the 1(habitable) case comes from the rarity of this case in dataset.But i still don't know why y_pred is in one column shape and i can't find a good explanation on how to do multi-class classification on with keras A.N.N on the internet.
Try adding class_weight, assign high weight to class 1
class_weight = {0: 1.,
1: 50.,
2: 2.}
classifier.fit(x_train, y_train, clf__class_weight = class_weight)
To get the class probability with Keras Sequential model
y_pred = classifier.predict_proba(x_test)

Why is my r2_score dependent on the units of the dependent variable

I have built a regression model using ANN relating 8 input parameters and 1 output parameter.
code
X = data.iloc[:,:-1]
y = data.iloc[:,8:9]*100
# Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train_us, X_test_us, y_train_us, y_test_us = train_test_split(X, y, test_size = 0.2, random_state = 0)
# Feature Scaling
from sklearn.preprocessing import StandardScaler
sc_X = StandardScaler()
sc_Y = StandardScaler()
X_train = sc_X.fit_transform(X_train_us)
X_test = sc_X.transform(X_test_us)
y_train = sc_Y.fit_transform(y_train_us)
y_test = sc_Y.transform(y_test_us)
# Importing the Keras libraries and packages
from keras.models import Sequential
from keras.layers import Dense
from keras.wrappers.scikit_learn import KerasRegressor
def base_model():
# Initialising the ANN
regressor = Sequential()
# Adding the input layer and the first hidden layer
regressor.add(Dense(units = 6, kernel_initializer = 'uniform', activation = 'relu', input_dim = 8))
# Adding the second hidden layer
regressor.add(Dense(units = 4, kernel_initializer = 'uniform', activation = 'relu'))
# Adding the output layer
regressor.add(Dense(units = 1, kernel_initializer = 'uniform'))
# Compiling the ANN
regressor.compile(optimizer = 'adam', loss = 'mse', metrics = ['mae'])
return regressor
# Fitting the ANN to the Training set
regressor = KerasRegressor(build_fn=base_model, epochs=500, batch_size=32)
regressor.fit(X_train,y_train)
# Predicting the Test & Train set with regressor built
y_pred = regressor.predict(X_test)
y_pred = sc_Y.inverse_transform(y_pred)
y_test = sc_Y.inverse_transform(y_test)
#calculate r2_score
from sklearn.metrics import r2_score
score_test = r2_score(y_test,y_pred)
I get an r2_score of 98%.Unit of my output variable is currently metres. If I multiply it by 100 and change it to centi-meters and train the model and calculate the r2_score it is 91%.
Why is my r2_score changing with the unit of the dependent variable. Shouldn't scaling take care of this?
Thanks!!

Evaluating Regression Neural Network model's accuracy

I am new to machine learning and created a neural network for regression output. I have ~95000 training examples and ~24000 test examples. I want to know how can I evaluate my model and get train and test errors? How to know the accuracy of this regression model? My Y variable values ranges between 100-200 and X have 9 input features in the dataset.
Here is my code:
import pandas as pd
from keras.layers import Dense, Activation,Dropout
from keras.models import Sequential
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
from matplotlib import pyplot
# Importing the dataset
# Importing the dataset
dataset = pd.read_csv('data2csv.csv')
X = dataset.iloc[:,1:10].values
y = dataset.iloc[:, :1].values
# Splitting the dataset into the Training set and Test set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size = 0.2, random_state = 0)
# Feature Scaling
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
# Initialising the ANN
model = Sequential()
# Adding the input layer and the first hidden layer
model.add(Dense(10, activation = 'relu', input_dim = 9))
# Adding the second hidden layer
model.add(Dense(units = 5, activation = 'sigmoid'))
model.add(Dropout(0.2))
# Adding the third hidden layer
model.add(Dense(units = 5, activation = 'relu'))
model.add(Dropout(0.2))
model.add(Dense(units = 5, activation = 'relu'))
model.add(Dense(units = 5, activation = 'relu'))
# Adding the output layer
model.add(Dense(units = 1))
#model.add(Dense(1))
# Compiling the ANN
model.compile(optimizer = 'adam', loss = 'mean_squared_error',metrics=['mae','mse','mape','cosine'])
# Fitting the ANN to the Training set
history=model.fit(X_train, y_train,validation_data=(X_val, y_val) ,batch_size = 1000, epochs = 100)
test_loss = model.evaluate(X_test,y_test)
loss = history.history['loss']
acc = history.history['mean_absolute_error']
val_loss = history.history['val_loss']
val_acc = history.history['val_mean_absolute_error']
mape_loss=history.history['mean_absolute_percentage_error']
cosine_los=history.history['cosine_proximity']
pyplot.plot(history.history['mean_squared_error'])
pyplot.plot(history.history['mean_absolute_error'])
pyplot.plot(history.history['mean_absolute_percentage_error'])
pyplot.plot(history.history['cosine_proximity'])
pyplot.show()
epochs = range(1, len(loss)+1)
plt.plot(epochs, loss, 'ro', label='Training loss')
plt.legend()
plt.show()
y_pred = model.predict(X_test)
plt.plot(y_test, color = 'red', label = 'Real data')
plt.plot(y_pred, color = 'blue', label = 'Predicted data')
plt.title('Prediction')
plt.legend()
plt.show()
[]
My test loss after model.evaluate. Note that here there are 5 loss functions as shown in the code.
1) 84.69654303799824
2) 7.030169963975834
3) 84.69654303799824
4) 5.241855282313331
5) -0.9999999996023872
To evaluate your model you can use evaluate method:
test_loss = model.evaluate(X_test, y_test)
It returns the loss on the given test data computed using the same loss function you used during training (i.e. mean_squared_error).
Further, If you want to get training loss at the end of each epoch you can use History object which is returned by fit method:
history = model.fit(...)
loss = history.history['loss']
The loss is a list containing the loss values of training at the end of each epoch. If you have used validation data when training the model (i.e. model.fit(..., validation_data=(X_val, y_val)) or have used any other metric like mean_absolute_error (i.e. model.compile(..., metrics=['mae'])), you can also access their values:
acc = history.history['mae']
val_loss = history.history['val_loss']
val_acc = history.history['val_mae']
Bonus: To plot the training loss curve:
epochs = range(1, len(loss)+1)
plt.plot(epochs, loss, 'ro', label='Training loss')
plt.legend()
plt.show()
To show validation loss while training:
model.fit(X_train, y_train, batch_size = 1000, epochs = 100, validation_data = (y_train,y_test))
I don't think you can easily get accuracy by plotting, since your input is 9 dimensional, you could plot the predicted y for each feature, just turn off the lines that join the dots i.e. plt.plot(x,y,'k.') note 'k' so no line, but I'm not sure if that will be useful.

Categories

Resources