I've been following the following tutorial to try and understand LSTMs and tensorflow a bit more. From running, it the training of the model goes smoothly, but when I try to use the trained tokenizer on the test data and then convert it to a numpy array, it doesn't work and I'm not really sure what the problem is. The relevant portion that goes wrong is below:
# test model
x_test = np.array(tokenizer.texts_to_sequences([str(txt) for txt in df_test['text'].values]))
The error it presents is as below:
Traceback (most recent call last):
File "/Users/pranavnair/Documents/Code/wpd/wpd.py", line 85, in <module>
x_test = np.array(x_test_data)
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (10824,) + inhomogeneous part.
I've tried using np.hstack instead of np.array, and that doesn't fix it. Would appreciate any help at all, thanks in advance.
Full code below for reference
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from keras.preprocessing.text import Tokenizer
from keras import utils
from keras.models import Sequential
from keras.layers import Dense, LSTM, Dropout
from keras.layers import Embedding
from keras.optimizers import Adam
# set random seed for reproducibility
RANDOM_SEED = 4
np.random.seed(RANDOM_SEED)
# import datasets
df_neut = pd.read_csv("./input/good.csv")
df_prom = pd.read_csv("./input/promotional.csv")
# clean up data to only include text
df_prom = df_prom.drop(df_prom.columns[1:], axis=1)
df_neut = df_neut.drop(df_neut.columns[1:], axis=1)
# combine datasets
df_neut.insert(1, 'label', 0) # neutral labels
df_prom.insert(1, 'label', 1) # promotional labels
# merge dataframes
df = pd.concat((df_neut, df_prom), ignore_index=True, axis=0)
# randomize order of dataframes
df = df.reindex(np.random.permutation(df.index))
# split into training and testing datasets
df_train, df_test = train_test_split(df, test_size=0.2, random_state=RANDOM_SEED)
# The maximum number of words to be used. (most frequent)
MAX_NB_WORDS = 50000
# perform data preprocessing using keras tokenizer
text_data = [str(txt) for txt in df_train['text'].values] # convert text data to strings
tokenizer = Tokenizer(num_words=MAX_NB_WORDS, filters='!"#$%&()*+,-./:;<=>?#[\]^_`{|}~', lower=True) # create tokenizer
tokenizer.fit_on_texts(text_data) # make dictionary
# vectorize dataset
x_train = tokenizer.texts_to_sequences(text_data)
# Max number of words in each sequence
MAX_SEQUENCE_LENGTH = 400
# pad sequence lengths
x_train = utils.pad_sequences(x_train, maxlen=MAX_SEQUENCE_LENGTH)
# get test labels
y_train = df_train['label'].values
# create sequential model
model = Sequential()
# create embedding layer
EMBEDDING_DIM = 100
model.add(Embedding(MAX_NB_WORDS+1, EMBEDDING_DIM, input_length=MAX_SEQUENCE_LENGTH))
# add LSTM layer to model
model.add(LSTM(80))
# setup model layers
model.add(Dropout(0.5))
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(1, activation='sigmoid'))
# setup binary classification via binary cross entropy loss
model.compile(loss='binary_crossentropy', optimizer=Adam(), metrics=['accuracy'])
# train for two epochs
EPOCHS = 4
BATCH_SIZE = 64
history = model.fit(x_train, y_train, epochs=EPOCHS, batch_size=BATCH_SIZE, validation_split=0.15)
# test model
x_test = np.array(tokenizer.texts_to_sequences([str(txt) for txt in df_test['text'].values]))
x_test = utils.pad_sequences(x_test, maxlen=MAX_SEQUENCE_LENGTH)
y_test = np.array(df_test['label'].values)
# evaluate model
scores = model.evaluate(x_test, y_test, batch_size=128)
print("The model has a test loss of %.2f and a test accuracy of %.1f%%" % (scores[0], scores[1]*100))
Related
The accuracy of the following deep learning model is very high, but the Matthews Correlation Coefficient is expected to be very low. How can I increase the Matthews Correlation Coefficient?
(I don't have a real y value.)
the training data (X_train.csv & y_train.csv) to train a model that can predict fraudulent
transaction (isfraud=1).
Verify performance of your model using the test data (X_test.csv). That is, create “y_test.csv”
that predict the “isfraud” variable for each “id” in the “x_test.csv” file.
This is my code but still I need more improvement.
import numpy as np
import pandas as pd
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import Input, Dense
from tensorflow.keras.models import Model
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import f1_score, matthews_corrcoef
# mount google drive
from google.colab import drive
drive.mount('./mount')
# Store x_train in DataFrame
df_xtrain = pd.read_csv("mount/My Drive/Colab Notebooks/x_train.csv")
df_xtrain.shape
# Store y_train in DataFrame
df_ytrain = pd.read_csv("mount/My Drive/Colab Notebooks/y_train.csv")
df_ytrain.shape
# Store x_test in DataFrame - NOTE: it has one more column than x_train, which is "id"
df_xtest = pd.read_csv("mount/My Drive/Colab Notebooks/x_test.csv")
df_xtest.shape
# Check for missing values _ df_xtrain
print("Number of missing values: ", df_xtrain.isnull().sum().sum())
# Check for missing values _ df_ytrain
print("Number of missing values: ", df_ytrain.isnull().sum().sum())
# Check for missing values _df_xtest
print("Number of missing values: ", df_xtest.isnull().sum().sum())
# Impute missing values with the mean of each column
df_xtrain.fillna(df_xtrain.mean(), inplace=True)
scaler = MinMaxScaler()
xtrain = scaler.fit_transform(df_xtrain)
xtest = scaler.transform(df_xtest.iloc[:,1:])
# convert y_train to numpy array
ytrain = df_ytrain.to_numpy()
# number of input rows
nrows = xtrain.shape[0]
# number of columns
ncols = xtrain.shape[1]
# Define model with two hidden layers
model = Sequential()
model.add(Dense(200, input_dim=ncols, activation='relu'))
model.add(Dense(20, activation='relu'))
model.add(Dense(100, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(xtrain, ytrain, epochs=50, batch_size=10000)
# make class predictions with the model
ytest_prediction = model.predict(xtest)
# round the predictions to make it either 0 or 1
ytest = np.round(ytest_prediction)
# Extract the first 10000 samples from ytrain to use as the ground truth for ytest
ytrain_for_testing = ytrain[:10000, :]
# calculate the Matthews Correlation Coefficient
mcc = matthews_corrcoef(ytrain_for_testing.flatten(), ytest.flatten())
print("Matthews Correlation Coefficient:", mcc)
I am working on an NLP sentiment analysis model to classify the sentiment (neutral, positive, negative) of a tweet based on the content of the tweet on Google Colab. I have prepped the test_x and train_x data into sequences of ints using the Tokenizer module. I followed the Tokenizer tutorial on the official TensorFlow Youtube channel so there should be nothing wrong with that part.
However, when beginning to train the model, I run into UnimplementedError: Graph Execution.
I tried changing the layers of the model and decreasing the size of my data sets but the same error still popped up every time.
Could anyone clarify what this error means and is trying to say and also point out what is wrong with my code? Thanks!
import os
import sys
import tensorflow as tf
import numpy as np
import pandas as pd
from tensorflow import keras
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
device_name = tf.test.gpu_device_name()
if len(device_name) > 0:
print("Found GPU at: {}".format(device_name))
else:
device_name = "/device:CPU:0"
print("No GPU, using {}.".format(device_name))
# Load dataset into a dataframe
train_data_path = "/content/drive/MyDrive/ML Datasets/tweet_sentiment_analysis/train.csv"
test_data_path = "/content/drive/MyDrive/ML Datasets/tweet_sentiment_analysis/test.csv"
train_df = pd.read_csv(train_data_path, encoding='unicode_escape')
test_df = pd.read_csv(test_data_path, encoding='unicode_escape')
train_df.head()
# Function to convert df into a list of strings
def convert_to_list(df, x):
selected_text_list = []
labels = []
for index, row in df.iterrows():
selected_text_list.append(str(row[x]))
labels.append(str(row['sentiment']))
return np.array(selected_text_list), np.array(labels)
train_sentences, train_labels = convert_to_list(train_df, 'selected_text')
test_sentences, test_labels = convert_to_list(test_df, 'text')
print(train_sentences)
print(train_labels)
# Instantiate tokenizer and create word_index
tokenizer = Tokenizer(num_words=1000, oov_token='<oov>')
tokenizer.fit_on_texts(train_sentences)
word_index = tokenizer.word_index
# Convert sentences into a sequence
train_sequence = tokenizer.texts_to_sequences(train_sentences)
test_sequence = tokenizer.texts_to_sequences(test_sentences)
# Padding sequences
pad_test_seq = pad_sequences(test_sequence, padding='post')
max_len = pad_test_seq[0].size
pad_train_seq = pad_sequences(train_sequence, padding='post', maxlen=max_len)
model = tf.keras.Sequential([
tf.keras.layers.Embedding(10000, 24, input_length=max_len),
tf.keras.layers.GlobalAveragePooling1D(),
tf.keras.layers.Dense(24, activation='relu'),
tf.keras.layers.Dense(1, activation='sigmoid')
])
with tf.device(device_name):
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
num_epochs = 20
with tf.device(device_name):
history = model.fit(pad_train_seq, train_labels, epochs=num_epochs, validation_data=(pad_test_seq, test_labels), verbose=2)
Here is a screenshot of the error:
Hi I want to use ResNet for Text data. I tried to look some code example lot of other data at the end I wrote the following code. But I'm not sure it's the correct way for ResNet or not.
NOTE::: this part is optional if i recieve an opinion on it. it will be great but I'm going to try it once the above one is corrected. if it is correct way then I want it to implement it in this way ----> ResNet should contain 18 layers in total whereas these layers should be divided into four stages and each stage should consist of two convolutional blocks. Each convolutional block should contain two convolutional layers with batch normalization and ReLU non_linearity in-between. Then, ResNet should pass the output from the convolutional layers to two fully-connected layers that will use the reduced data to classify the initial data to a given website class. Last but not least, you should use Adam optimizer and categorical cross-entropy (typically used for multi-class classification problems). Make sure that you identify and use the optimal hyper-parameters for your ResNet.
import pandas as pd
import os
import numpy as np
from sklearn import metrics
from scipy.stats import zscore
from sklearn.model_selection import KFold
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Activation
class ResNet_class():
def __init__(self):
# Cross-Validate
self.no_of_folds = int(input('enter no of K_fold: '))
self.kf = KFold(self.no_of_folds, shuffle=True, random_state=42) # Use for KFold classification
self.EPOCHS = int(input('enter no of epochs: '))
def check_test(self):
df = pd.read_csv(
"https://data.heatonresearch.com/data/t81-558/jh-simple-dataset.csv",
na_values=['NA','?'])
df = pd.concat([df,pd.get_dummies(df['job'],prefix="job")],axis=1)
df.drop('job', axis=1, inplace=True)
df = pd.concat([df,pd.get_dummies(df['area'],prefix="area")],axis=1)
df.drop('area', axis=1, inplace=True)
df = pd.concat([df,pd.get_dummies(df['product'],prefix="product")],axis=1)
df.drop('product', axis=1, inplace=True)
med = df['income'].median()
df['income'] = df['income'].fillna(med)
df['income'] = zscore(df['income'])
df['aspect'] = zscore(df['aspect'])
df['save_rate'] = zscore(df['save_rate'])
df['subscriptions'] = zscore(df['subscriptions'])
x_columns = df.columns.drop('age').drop('id')
x = df[x_columns].values
y = df['age'].values
oos_y = []
oos_pred = []
fold = 0
for train, test in self.kf.split(x):
fold += 1
print(f"Fold #{fold}")
x_train = x[train]
y_train = y[train]
x_test = x[test]
y_test = y[test]
model = Sequential()
model.add(Dense(20, input_dim=x.shape[1], activation='relu'))
model.add(Dense(10, activation='relu'))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='adam')
model.fit(x_train, y_train, validation_data=(x_test, y_test), verbose=0,
epochs=self.EPOCHS)
pred = model.predict(x_test)
oos_y.append(y_test)
oos_pred.append(pred)
score = np.sqrt(metrics.mean_squared_error(pred, y_test))
print(f"Fold score (RMSE): {score}")
oos_y = np.concatenate(oos_y)
oos_pred = np.concatenate(oos_pred)
score = np.sqrt(metrics.mean_squared_error(oos_pred, oos_y))
print(f"Final, out of sample score (RMSE): {score}")
oos_y = pd.DataFrame(oos_y)
oos_pred = pd.DataFrame(oos_pred)
oosDF = pd.concat([df, oos_y, oos_pred], axis=1)
resnet = ResNet_class()
resnet.check_test()
I'm trying to train a model for a text classification and the model take a list of maximum 300 integer embedded from articles. The model trains without problem and all but the accuracy won't go up.
The target consists of 41 categories encoded into int from 0 to 41 and were then normalized.
The table would look like this
Also, I don't know how my model should look like since I refered on two different example as per below
A binary classifier with one input column and one output column Example 1
Multiple class classifier with multiple columns as input Example 2
I have tried modifying my model based on both model but the model accuracy won't change and even getting lower per epoch
Should I add more layers to my model or I have done something stupid that I haven't realized?
Note: If the 'df.pickle' download link broken, use this link
from sklearn.model_selection import train_test_split
from urllib.request import urlopen
from os.path import exists
from os import mkdir
import tensorflow as tf
import pandas as pd
import pickle
# Define dataframe path
df_path = 'df.pickle'
# Check if local dataframe exists
if not exists(df_path):
# Download binary from dropbox
content = urlopen('https://ucd92a22d5e0d4d29b8edb608305.dl.dropboxusercontent.com/cd/0/get/Askx_25n3JI-jmnZsWXmMmRgd4O2EH1w9l0U6zCMq7xdSXs_IN_i2zuUviseqa9N7-WrReFbGhQi8CeseV5cNsFTO8dzRmSdxjr-MWEDQNpPaZ8Ik29E_58YAjY57qTc4CA/file#').read()
# Write to file
with open(df_path, 'wb') as file: file.write(content)
# Load the dataframe from bytes
df = pickle.loads(content)
# If the file exists (aka. downloaded)
else:
# Load the dataframe from file
df = pickle.load(open(df_path, 'rb'))
# Normalize the category
df['Category_Code'] = df['Category_Code'].apply(lambda x: x / 41)
train_df, test_df = [pd.DataFrame() for _ in range(2)]
x_train, x_test, y_train, y_test = train_test_split(df['Content_Parsed'], df['Category_Code'], test_size=0.15, random_state=8)
train_df['Content_Parsed'], train_df['Category_Code'] = x_train, y_train
test_df['Content_Parsed'], test_df['Category_Code'] = x_test, y_test
# Variable containing the number of words we want to keep in our vocabulary
NUM_WORDS = 10000
# Input/Token length
SEQ_LEN = 300
# Create tokenizer for our data
tokenizer = tf.keras.preprocessing.text.Tokenizer(num_words=NUM_WORDS, oov_token='<UNK>')
tokenizer.fit_on_texts(train_df['Content_Parsed'])
# Convert text data to numerical indexes
train_seqs=tokenizer.texts_to_sequences(train_df['Content_Parsed'])
test_seqs=tokenizer.texts_to_sequences(test_df['Content_Parsed'])
# Pad data up to SEQ_LEN (note that we truncate if there are more than SEQ_LEN tokens)
train_seqs=tf.keras.preprocessing.sequence.pad_sequences(train_seqs, maxlen=SEQ_LEN, padding="post")
test_seqs=tf.keras.preprocessing.sequence.pad_sequences(test_seqs, maxlen=SEQ_LEN, padding="post")
# Create Models folder if not exists
if not exists('Models'): mkdir('Models')
# Define local model path
model_path = 'Models/model.pickle'
# Check if model exists/pre-trained
if not exists(model_path):
# Define word embedding size
EMBEDDING_SIZE = 16
# Create new model
'''
model = tf.keras.Sequential([
tf.keras.layers.Embedding(NUM_WORDS, EMBEDDING_SIZE),
tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(EMBEDDING_SIZE)),
# tf.keras.layers.Dense(EMBEDDING_SIZE, activation='relu'),
tf.keras.layers.Dense(1, activation='sigmoid')
])
'''
model = tf.keras.Sequential([
tf.keras.layers.Embedding(NUM_WORDS, EMBEDDING_SIZE),
# tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(EMBEDDING_SIZE)),
tf.keras.layers.GlobalAveragePooling1D(),
tf.keras.layers.Dense(EMBEDDING_SIZE, activation='relu'),
tf.keras.layers.Dense(1, activation='sigmoid')
])
# Compile the model
model.compile(
optimizer='adam',
loss='binary_crossentropy',
metrics=['accuracy']
)
# Stop training when a monitored quantity has stopped improving.
es = tf.keras.callbacks.EarlyStopping(monitor='val_acc', mode='max', patience=1)
# Define batch size (Can be tuned to improve model accuracy)
BATCH_SIZE = 16
# Define number or cycle to train
EPOCHS = 20
# Using GPU (If error means you don't have GPU. Use CPU instead)
with tf.device('/GPU:0'):
# Train/Fit the model
history = model.fit(
train_seqs,
train_df['Category_Code'].values,
batch_size=BATCH_SIZE,
epochs=EPOCHS,
validation_split=0.2,
validation_steps=30,
callbacks=[es]
)
# Evaluate the model
model.evaluate(test_seqs, test_df['Category_Code'].values)
# Save the model into a file
with open(model_path, 'wb') as file: file.write(pickle.dumps(model))
else:
# Load the model
model = pickle.load(open(model_path, 'rb'))
# Check the model
model.summary()
After 2 days of tweaking and understanding more examples I found this website which explains quite well about the multi-class classification.
The details of changes I made are as follows:
Since I'm going to build a model for multiple classes, during the model compilation the model should use categorical_crossentropy as it's loss function instead of binary_crossentropy.
The model should produce number of output with similar length as your total class you're going to classify which in my case 41. (One hot encoding)
The last layer's activation function should be "softmax" since we're choosing a label with the highest confidence level (closest to 1.0).
You will need to tweak the layers accordingly based on the number of classes you're going to classify. See here on how to improve your model.
My final code would look something just like this
from sklearn.model_selection import train_test_split
from urllib.request import urlopen
from functools import reduce
from os.path import exists
from os import listdir
from sys import exit
import tensorflow as tf
import pandas as pd
import pickle
import re
# Specify dataframe path
df_path = 'df.pickle'
# Check if the file exists
if not exists(df_path):
# Specify url of the dataframe binary
url = 'https://www.dropbox.com/s/76hibe24hmpz3bk/df.pickle?dl=1'
# Read the byte content from url
content = urlopen(url).read()
# Write to a file to save up time
with open(df_path, 'wb') as file: file.write(pickle.dumps(content))
# Unpickle the dataframe
df = pickle.loads(content)
else:
# Load the pickle dataframe
df = pickle.load(open(df_path, 'rb'))
# Useful variables
MAX_NUM_WORDS = 50000 # Vocabulary size for our tokenizer
MAX_SEQ_LENGTH = 600 # Maximum length of tokens (for padding later)
EMBEDDING_SIZE = 256 # Embedding size (Tweak to improve accuracy)
OUTPUT_LENGTH = len(df['Category'].unique()) # Number of class to be classified
# Create our tokenizer
tokenizer = tf.keras.preprocessing.text.Tokenizer(num_words=MAX_NUM_WORDS, lower=True)
# Fit our tokenizer with words/tokens
tokenizer.fit_on_texts(df['Content_Parsed'].values)
# Get our token vocabulary
word_index = tokenizer.word_index
print('Found {} unique tokens'.format(len(word_index)))
# Parse our text into sequence of numbers using our tokenizer
X = tokenizer.texts_to_sequences(df['Content_Parsed'].values)
# Pad the sequence up to the MAX_SEQ_LENGTH
X = tf.keras.preprocessing.sequence.pad_sequences(X, maxlen=MAX_SEQ_LENGTH)
print('Shape of feature tensor: {}'.format(X.shape))
# Convert our labels into dummy variable (More info on the link provided above)
Y = pd.get_dummies(df['Category']).values
print('Shape of label tensor: {}'.format(Y.shape))
# Split our features and labels into test and train dataset
x_train, x_test, y_train, y_test = train_test_split(X, Y, test_size=0.1, random_state=42)
print(x_train.shape, y_train.shape)
print(x_test.shape, y_test.shape)
# Creating our model
model = tf.keras.Sequential()
model.add(tf.keras.layers.Embedding(MAX_NUM_WORDS, EMBEDDING_SIZE, input_length=MAX_SEQ_LENGTH))
model.add(tf.keras.layers.SpatialDropout1D(0.2))
# The number 64 could be changed based on your model performance
model.add(tf.keras.layers.LSTM(64, dropout=0.2, recurrent_dropout=0.2))
# Our output layer with length similar to the OUTPUT_LENGTH
model.add(tf.keras.layers.Dense(OUTPUT_LENGTH, activation='softmax'))
# Compile our model with "categorical_crossentropy" loss function
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
# Model variables
EPOCHS = 100 # Number of cycle to run (The early stopping may stop the training process accordingly)
BATCH_SIZE = 64 # Batch size (Tweaking this may improve model performance a bit)
checkpoint_path = 'model_checkpoints' # Checkpoint path of our model
# Use GPU if available
with tf.device('/GPU:0'):
# Fit/Train our model
history = model.fit(
x_train, y_train,
epochs=EPOCHS,
batch_size=BATCH_SIZE,
validation_split=0.1,
callbacks=[
tf.keras.callbacks.EarlyStopping(monitor='val_loss', min_delta=0.0001),
tf.keras.callbacks.ModelCheckpoint(
checkpoint_path,
monitor='val_acc',
save_best_only=True,
save_weights_only=False
)
],
verbose=1
)
Now, my model accuracies perform well and are increasing each epoch but since the validation accuracies (val_acc around 76~77 percent) are not performing well, I may need to tweak the model/layers a bit.
The output snapshot is provided below
I am quite new to Keras so apologies in advance for any stupid mistakes. I am currently attempting to try out some good old cross-domain transfer learning between two datasets. I have a model here that is trained and executed on a voice recognition dataset that I have generated (code is at the bottom of this question because it's quite long)
If I were to train a new model, say model_2 on a different dataset, then I'd get a baseline from the initial random distribution of weights.
I wonder, is it possible to train model_1 and model_2, then, and this is the bit I don't know how to do; can I take the two 256 and 128 dense layers from model_1 (with trained weights) and use them as starting points for a model_3 - which is dataset 2 with the initial weight distribution from model_1?
So, in the end, I have the following:
Model_1 which starts from a random distribution and trains on dataset 1
Model_2 which starts from a random distribution and trains on dataset 2
Model_3 which starts from the distribution trained in Model_1 and trains on dataset 2.
My question is, how would I go about doing step 3 in the above? I don't want to freeze the weights, I just want an initial distribution for training from a past experiment
Any help would be greatly appreciated. Thank you! Apologies if I didn't make it quite clear enough what I'm going for
My code to train Model_1 is as follows:
import numpy
import pandas
from keras.models import Sequential
from keras.layers import Dense
from keras.callbacks import EarlyStopping
from keras.wrappers.scikit_learn import KerasClassifier
from sklearn.model_selection import cross_val_score
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import KFold
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
import matplotlib.pyplot as plt
from keras.utils import np_utils
from keras.layers.normalization import BatchNormalization
import time
start = time.clock()
# fix random seed for reproducibility
seed = 1
numpy.random.seed(seed)
# load dataset
dataframe = pandas.read_csv("voice.csv", header=None)
dataset = dataframe.values
# split into input (X) and output (Y) variables
numVars = len(dataframe.columns) - 1
numClasses = dataframe[numVars].nunique()
X = dataset[:,0:numVars].astype(float)
Y = dataset[:,numVars]
print("THERE ARE " + str(numVars) + " ATTRIBUTES")
print("THERE ARE " + str(numClasses) + " UNIQUE CLASSES")
# encode class values as integers
encoder = LabelEncoder()
encoder.fit(Y)
encoded_Y = encoder.transform(Y)
# convert integers to dummy variables (i.e. one hot encoded)
dummy_y = np_utils.to_categorical(encoded_Y)
calls = [EarlyStopping(monitor='acc', min_delta=0.0001, patience=100, verbose=2, mode='max', restore_best_weights=True)]
# define baseline model
def baseline_model():
# create model
model = Sequential()
model.add(BatchNormalization())
model.add(Dense(256, input_dim=numVars, activation='sigmoid'))
model.add(Dense(128, activation='sigmoid'))
model.add(Dense(numClasses, activation='softmax'))
# Compile model
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
return model
estimator = KerasClassifier(build_fn=baseline_model, epochs=2000, batch_size=1000, verbose=1)
kfold = KFold(n_splits=10, shuffle=True, random_state=seed)
results = cross_val_score(estimator, X, dummy_y, cv=kfold, fit_params={'callbacks':calls})
print("Baseline: %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100))
#your code here
print (time.clock() - start)
PS: Input attributes and outputs will all be the same between the two datasets, all that will change are attribute values. I am curious, can this be done if the two datasets have different numbers of output classes?
In short, to fine-tune Model_3 from Model_1, just call model.load_weights('/path/to/model_1.h5', by_name=True) after model.compile(...). Of course, you must have saved the trained Model_1 first.
If I understood correct, you have the same number of features and classes among the two datasets, so you do not even need to re-design your model. If you had different set of classes, then you had to give different names to the last layers of Model_1 and Model_3:
model.add(Dense(numClasses, activation='softmax', name='some_unique_name'))