I have few datasets, I've trained model on the biggest one and now want to see how it will predict values for different set of data. For saving the model, I've used ModelCheckpoint callback with argument save_weights_only=True and later I've added line model.save(...). Now I would like to use this model to predict on another dataset - how to do this properly? My biggest concern is that prior to the training, I've shifted my "reference" dataset for number of samples, so it's predicting X samples ahead and I'm not sure how it will behave now when I'll try to model.predict() for another set of data - will it predict for whole dataset or just for the number of shifted values? My second question is: would be it more reasonable to save full model with ModelCheckpoint with weights (meaning - delete save_weights_only=True argument) or my current approach is good?
I'm using Tensorflow 1.14.
I was following Hvass Labs tutorial on RRN time series predicting, my code looks like this:
df =pd.read_json(f'dataset1.json')
input_data = self.df.values[0:-10]
output_data = self.df.values[:-10]
num_data = len(input_data)
num_train = int(0.9 * num_data)
num_test = num_data - num_train
input_train = x_data[0:num_train]
input_test = x_data[num_train:]
output_train = y_data[0:num_train]
output_test = y_data[num_train:]
num_input_signals = input_data.shape[1]
num_output_signals = output_data.shape[1]
Then I've declared the model:
model = Sequential()
input_shape=(None, num_input_signals,)))
def __batch_generator(self, batch_size, sequence_length, input_train, output_train):
while True:
x_shape = (batch_size, sequence_length,num_input_signals)
x_batch = np.zeros(shape=x_shape, dtype=np.float16)
y_shape = (batch_size, sequence_length,num_output_signals)
y_batch = np.zeros(shape=y_shape, dtype=np.float16)
for i in range(batch_size):
idx = np.random.randint(num_train - sequence_length)
x_batch[i] = input_train[idx:idx + sequence_length]
y_batch[i] = output_train[idx:idx + sequence_length]
yield (x_batch, y_batch)
validation_data = np.expand_dims(input_test, axis=0), np.expand_dims(output_test, axis=0)
optimizer = RMSprop(lr=0.001)
generator = self.__batch_generator(256, 168, input_train, output_train)
model.compile(loss='mean_squared_error', optimizer='adam')
callback_checkpoint = ModelCheckpoint(filepath=f'model',
And as you can see, I've shifted my data for 10 samples, so in the output dataframe last 10 rows show NaN and as far as I understand Tensorflow, following code should predict values for 10 samples and later I would like to use this trained model for prediction on another datasets - does it mean it will predict last 10 samples as well, or it will run the prediction for whole dataset? My "reference" dataset looks like this:
"timestamp": "2019-02-11 08:00:00",
"sine": -0.5877852522924633
"timestamp": "2019-02-11 09:00:00",
"sine": -0.809016994374933
"timestamp": "2019-02-11 10:00:00",
"sine": -0.9510565162951421
And the other one has less samples (100k vs 10k) of cosine function. When it comes to loading model, I want to use tensorflow.keras.models.load_model.
Lastly, is that possible that prediction with model trained on 100k dataset, when given 10k dataset, will perform worse than model trained on 10k dataset? Since the first model was trained on bigger dataset, I assume that during prediction, it will expect similar amount of data to properly predict, am I right?
I am building a multi input Network using the Keras functionnal API, but I struggle to find and understand the right format for my input data throw the network.
I have two main input :
One is an image, that goes throw a fine-tuned ResNet50 CNN
The second is a simple numpy array (X_train) containing metadata about the image (position and size of the image). This one goes throw a simple dense network.
I load the images from a dataframe, containing the metadata, and the filepath to the corresponding image.
I use ImageDataGenerator and the flow_from_dataframe method to load my images :
datagen = ImageDataGenerator(preprocessing_function=preprocess_input)
train_flow = datagen.flow_from_dataframe(
I can train the two networks separately using their own data, no problems until here.
The two output of the two distinct networks are then combined to a dense network to output a 10 digits probability vector :
# Create the input for the final dense network using the output of both the dense MLP and CNN
combinedInput = concatenate([cnn.output, mlp.output])
x = Dense(512, activation="relu")(combinedInput)
x = Dense(256, activation="relu")(x)
x = Dense(128, activation="relu")(x)
x = Dense(32, activation="relu")(x)
x = Dense(10, activation="softmax")(x)
model = Model(inputs=[cnn.input, mlp.input], outputs=x)
# Compile the model
opt = Adam(lr=1e-3, decay=1e-3 / 200)
# Train the model
model_history = model.fit(x=(train_flow, X_train),
However, when I cannot train the overall network, I get the following error :
ValueError: Failed to find data adapter that can handle input: (<class 'tuple'> containing values of types {"<class 'keras_preprocessing.image.dataframe_iterator.DataFrameIterator'>", "<class 'numpy.ndarray'>"}), <class 'pandas.core.series.Series'>
I understand I am not using the correct input format for my input data.
I can train my CNN with the train_flow, and my dense network with X_train, so I was hoping this would work.
Do you have any idea of how to combine image data and nump array into a multi input array ?
Thank you for all the information you can give me!
I finally found how to do it, inspiring me from the post # Nima Aghli proposed.
Here is how I did that :
First instanciate the preprocessing function (for me the one used for ResNest50) :
from tensorflow.keras.applications.resnet50 import ResNet50, preprocess_input
def preprocess_function(x):
if x.ndim == 3:
x = x[np.newaxis, :, :, :]
return preprocess_input(x)
# Initializing the datagen, using the above function :
datagen = ImageDataGenerator(preprocessing_function=preprocess_input)
And then Define the Custom Data Generator that will yield randomly sampled array coupling image & metadata, whiule making sure not to be ever out of data (so that you can run on which ever number of epochs) :
def createGenerator(dff, verif=False, batch_size=BATCH_SIZE):
# Shuffles the dataframe, and so the batches as well
dff = dff.sample(frac=1)
# Shuffle=False is EXTREMELY important to keep order of image and coord
flow = datagen.flow_from_dataframe(
idx = 0
n = len(dff) - batch_size
batch = 0
while True :
# Get next batch of images
X1 = flow.next()
# idx to reach
end = idx + X1[0].shape[0]
# get next batch of lines from df
X2 = dff[["x", "y", "w", "h"]][idx:end].to_numpy()
dff_verif = dff[idx:end]
# Updates the idx for the next batch
idx = end
# print("batch nb : ", batch, ", batch_size : ", X1[0].shape[0])
# Checks if we are at the end of the dataframe
if idx==len(dff):
idx = 0
# Yields the image, metadata & target batches
if verif==True :
yield [X1[0], X2], X1[1], dff_verif
else :
yield [X1[0], X2], X1[1] #Yield both images, metadata and their mutual label
I voluntarily kept the commentaries as it helps grasps all the operations that are computed.
The main point/problem is to get images from all the dataframe, without ever getting short on images, and having batches of the same size.
Also, we have to be careful to the order of the images/metadata, so tht the right info is connected to the right image in the returned array.
I am trying to create a dataset for audio recognition with a simple Keras sequential model.
This is the function I am using to create the model:
def dnn_model(input_shape, output_shape):
model = keras.Sequential()
model.add(layers.Dense(512, activation = "relu"))
model.add(layers.Dense(output_shape, activation = "softmax"))
model.compile( optimizer='adam',
return model
And I am Generating my trainingsdata with this Generator function:
def generator(x_dirs, y_dirs, hmm, sampling_rate, parameters):
window_size_samples = tools.sec_to_samples(parameters['window_size'], sampling_rate)
window_size_samples = 2**tools.next_pow2(window_size_samples)
hop_size_samples = tools.sec_to_samples(parameters['hop_size'],sampling_rate)
for i in range(len(x_dirs)):
features = fe.compute_features_with_context(x_dirs[i],**parameters)
praat = tools.praat_file_to_target( y_dirs[i],
yield features,praat
The variables x_dirs and y_dirs contain a list of paths to labels and audiofiles. In total I got 8623 files to train my model. This is how I train my model:
def train_model(model, model_dir, x_dirs, y_dirs, hmm, sampling_rate, parameters, steps_per_epoch=10,epochs=10):
model.fit((generator(x_dirs, y_dirs, hmm, sampling_rate, parameters)),
return model
My problem now is that if I pass all 8623 files it will use all 8623 files to train the model in the first epoch and complain after the first epoch that it needs steps_per_epoch * epochs batches to train the model.
I tested this with only 10 of the 8623 files with a sliced list, but than Tensorflow complains that there are needed 100 batches.
So how do I have my Generator yield out data that its works best? I always thought that steps_per_epoch just limits the data received per epoch.
The fit function is going to exhaust your generator, that is to say, once it will have yielded all your 8623 batches, it wont be able to yield batches anymore.
You want to solve the issue like this:
def generator(x_dirs, y_dirs, hmm, sampling_rate, parameters, epochs=1):
for epoch in range(epochs): # or while True:
window_size_samples = tools.sec_to_samples(parameters['window_size'], sampling_rate)
window_size_samples = 2**tools.next_pow2(window_size_samples)
hop_size_samples = tools.sec_to_samples(parameters['hop_size'],sampling_rate)
for i in range(len(x_dirs)):
features = fe.compute_features_with_context(x_dirs[i],**parameters)
praat = tools.praat_file_to_target( y_dirs[i],
yield features,praat
I'm trying to make a simple neural network with Keras, but my weights won't update after calling fit()
To test the model, I created a simple data set, called mem. mem is a deque of tuples. mem[i][0] gives a np.array of size inp_len of only ones or only zeros.
Here is my code:
inp_len = 5*3 + 3187*4
model = Sequential()
model.add(Dense(units=124, kernel_initializer='ones', input_shape = (inp_len,)))
model.add(Dense(48, kernel_initializer='ones'))
model.add(Dense(48, kernel_initializer='ones'))
model.add(Dense(48, kernel_initializer='ones'))
model.add(Dense(1, activation = 'sigmoid'))
model.compile(loss='binary_crossentropy', optimizer=Adam(lr=learning_rate, decay=learning_rate_decay))
batch_size = 20
batch_old = random.sample(mem, min(len(mem), batch_size))
for i_batch in range(len(batch_old)):
X = batch_old[i_batch][0].reshape(1,inp_len)
y = np.array([[X[0]]])
model.fit(X, y, epochs = 1, batch_size = 1)
I use 1 epoch and with a batch size of 1, because I want to use model.predict() in another part of the code with a different batch size.
Can someone please explain why model.get_weights()[0] keeps returning ones after fitting the model?
I want to replace the loss function related to my neural network during training, this is the network:
model = tensorflow.keras.models.Sequential()
model.add(tensorflow.keras.layers.Conv2D(32, kernel_size=(3, 3), activation="relu", input_shape=input_shape))
model.add(tensorflow.keras.layers.Conv2D(64, (3, 3), activation="relu"))
model.add(tensorflow.keras.layers.MaxPooling2D(pool_size=(2, 2)))
model.add(tensorflow.keras.layers.Dense(128, activation="relu"))
model.add(tensorflow.keras.layers.Dense(output_classes, activation="softmax"))
model.compile(loss=tensorflow.keras.losses.categorical_crossentropy, optimizer=tensorflow.keras.optimizers.Adam(0.001), metrics=['accuracy'])
history = model.fit(x_train, y_train, batch_size=128, epochs=5, validation_data=(x_test, y_test))
so now I want to change tensorflow.keras.losses.categorical_crossentropy with another, so I made this:
model.compile(loss=tensorflow.keras.losses.mse, optimizer=tensorflow.keras.optimizers.Adam(0.001), metrics=['accuracy'])
history = model.fit(x_improve, y_improve, epochs=1, validation_data=(x_test, y_test)) #FIXME bug during training
but I have this error:
ValueError: No gradients provided for any variable: ['conv2d/kernel:0', 'conv2d/bias:0', 'conv2d_1/kernel:0', 'conv2d_1/bias:0', 'dense/kernel:0', 'dense/bias:0', 'dense_1/kernel:0', 'dense_1/bias:0'].
Why? How can I fix it? There is another way to change loss function?
I'm currently working on google colab with Tensorflow and Keras and i was not able to recompile a model mantaining the weights, every time i recompile a model like this:
with strategy.scope():
model = hd_unet_model(INPUT_SIZE)
loss=tf.keras.losses.MeanSquaredError() ,
the weights gets resetted.
so i found an other solution, all you need to do is:
Get the model with the weights you want ( load it or something else )
gets the weights of the model like this:
weights = model.get_weights()
recompile the model ( to change the loss function )
set again the weights of the recompiled model like this:
launch the training
i tested this method and it seems to work.
so to change the loss mid-Training you can:
Compile with the first loss.
Train of the first loss.
Save the weights.
Recompile with the second loss.
Load the weights.
Train on the second loss.
So, a straightforward answer I would give is: switch to pytorch if you want to play this kind of games. Since in pytorch you define your training and evaluation functions, it takes just an if statement to switch from a loss function to another one.
Also, I see in your code that you want to switch from cross_entropy to mean_square_error, the former is suitable for classification the latter for regression, so this is not really something you can do, in the code that follows I switched from mean squared error to mean squared logarithmic error, which are both loss suitable for regression.
Despite other answers offers solutions to your question (see change-loss-function-dynamically-during-training) it is not clear wether you can trust or not the results. Some people found that even with a customised function sometimes Keras keep training with the first loss.
My solution is based on train_on_batch, which allows us to train a model in a for loop and therefore stop training it whenever we prefer to recompile the model with a new loss function. Please note that recompiling the model does not reset the weights (see:Does recompiling a model re-initialize the weights?).
The dataset can be found here Boston housing dataset
# Regression Example With Boston Dataset: Standardized and Larger
from pandas import read_csv
from keras.models import Sequential
from keras.layers import Dense
from sklearn.model_selection import train_test_split
from keras.losses import mean_squared_error, mean_squared_logarithmic_error
from matplotlib import pyplot
import matplotlib.pyplot as plt
# load dataset
dataframe = read_csv("housing.csv", delim_whitespace=True, header=None)
dataset = dataframe.values
# split into input (X) and output (Y) variables
X = dataset[:,0:13]
y = dataset[:,13]
trainX, testX, trainy, testy = train_test_split(X, y, test_size=0.33, random_state=42)
# create model
model = Sequential()
model.add(Dense(13, input_dim=13, kernel_initializer='normal', activation='relu'))
model.add(Dense(6, kernel_initializer='normal', activation='relu'))
model.add(Dense(1, kernel_initializer='normal'))
batch_size = 25
# have to define manually a dict to store all epochs scores
history = {}
history['history'] = {}
history['history']['loss'] = []
history['history']['mean_squared_error'] = []
history['history']['mean_squared_logarithmic_error'] = []
history['history']['val_loss'] = []
history['history']['val_mean_squared_error'] = []
history['history']['val_mean_squared_logarithmic_error'] = []
# first compiling with mse
model.compile(loss='mean_squared_error', optimizer='adam', metrics=[mean_squared_error, mean_squared_logarithmic_error])
# define number of iterations in training and test
train_iter = round(trainX.shape[0]/batch_size)
test_iter = round(testX.shape[0]/batch_size)
for epoch in range(2):
# train iterations
loss, mse, msle = 0, 0, 0
for i in range(train_iter):
start = i*batch_size
end = i*batch_size + batch_size
batchX = trainX[start:end,]
batchy = trainy[start:end,]
loss_, mse_, msle_ = model.train_on_batch(batchX,batchy)
loss += loss_
mse += mse_
msle += msle_
# test iterations
val_loss, val_mse, val_msle = 0, 0, 0
for i in range(test_iter):
start = i*batch_size
end = i*batch_size + batch_size
batchX = testX[start:end,]
batchy = testy[start:end,]
val_loss_, val_mse_, val_msle_ = model.test_on_batch(batchX,batchy)
val_loss += val_loss_
val_mse += val_mse_
val_msle += msle_
# recompiling the model with new loss
model.compile(loss='mean_squared_logarithmic_error', optimizer='adam', metrics=[mean_squared_error, mean_squared_logarithmic_error])
for epoch in range(2):
# train iterations
loss, mse, msle = 0, 0, 0
for i in range(train_iter):
start = i*batch_size
end = i*batch_size + batch_size
batchX = trainX[start:end,]
batchy = trainy[start:end,]
loss_, mse_, msle_ = model.train_on_batch(batchX,batchy)
loss += loss_
mse += mse_
msle += msle_
# test iterations
val_loss, val_mse, val_msle = 0, 0, 0
for i in range(test_iter):
start = i*batch_size
end = i*batch_size + batch_size
batchX = testX[start:end,]
batchy = testy[start:end,]
val_loss_, val_mse_, val_msle_ = model.test_on_batch(batchX,batchy)
val_loss += val_loss_
val_mse += val_mse_
val_msle += msle_
# Some plots to check what is going on
# loss function
pyplot.plot(history['history']['loss'], label='train')
pyplot.plot(history['history']['val_loss'], label='test')
# Only mean squared error
pyplot.title('Mean Squared Error')
pyplot.plot(history['history']['mean_squared_error'], label='train')
pyplot.plot(history['history']['val_mean_squared_error'], label='test')
# Only mean squared logarithmic error
pyplot.title('Mean Squared Logarithmic Error')
pyplot.plot(history['history']['mean_squared_logarithmic_error'], label='train')
pyplot.plot(history['history']['val_mean_squared_logarithmic_error'], label='test')
The resulting plot confirm that the loss function is changing after the second epoch:
The drop in the loss function is due to the fact that the model is switching from normal mean squared error to the logarithmic one, which has much lower values. Printing the scores also prove that the used loss truly changed:
[599.5209197998047, 570.4041115897043, 3.8622902120862688, 2.1578191178185597]
[599.5209197998047, 570.4041115897043, 510.29034205845426, 425.32058388846264]
[8.624503476279122, 6.346359729766846, 3.8622902120862688, 2.1578191178185597]
In the first two epochs the values of loss are equal to ones of mean_square_error and during the third and fourth epochs the values becomes equal to the ones of mean_square_logarithmic_error, which is the new loss that was set. So it seems that using train_on_batch allows to change loss function, nevertheless I want to stress out again that this is basically what one should do on pytoch to achieve the same results, with the difference that the behaviour of pytorch (in this scenario and in my opinion) is more reliable.
I have built a LSTM model using Keras library to predict duplicate questions on the Quora official dataset. The test labels are 0 or 1. 1 indicates the question pair is duplicate. After building the model using model.fit, I test the model using model.predict on the test data. The output is an array of values(probabilities) like below:
[ 0.00514298]
[ 0.15161049]
[ 0.27588326]
[ 0.00236167]
[ 1.80067325]
[ 0.01048524]
[ 1.43425131]
[ 1.99202418]
[ 0.54853892]
[ 0.02514757]
I am only showing the first 10 values in the array. I don't understand what do these values mean and how do I compare it against the test labels to calculate the test accuracy. I want the model to output the binary predicted values as 0 or 1 rather than the probabilities. Please refer the last section of my code below:
sequence_1_input = Input(shape=(MAX_SEQUENCE_LENGTH,), dtype='int32')
embedded_sequences_1 = embedding_layer(sequence_1_input)
x1 = lstm_layer(embedded_sequences_1)
sequence_2_input = Input(shape=(MAX_SEQUENCE_LENGTH,), dtype='int32')
embedded_sequences_2 = embedding_layer(sequence_2_input)
y1 = lstm_layer(embedded_sequences_2)
merged = concatenate([x1, y1])
merged = Dropout(rate_drop_dense)(merged)
merged = BatchNormalization()(merged)
merged = Dense(num_dense, activation=act)(merged)
merged = Dropout(rate_drop_dense)(merged)
merged = BatchNormalization()(merged)
preds = Dense(1, activation='sigmoid')(merged)
## train the model
model = Model(inputs=[sequence_1_input, sequence_2_input], \
hist = model.fit([data_1_train, data_2_train], labels_train, \
validation_data=([data_1_val, data_2_val], labels_val, weight_val), \
epochs=200, batch_size=2048, shuffle=True, \
class_weight=class_weight, callbacks=[early_stopping, model_checkpoint])
preds = model.predict([test_data_1, test_data_2], batch_size=8192,
preds += model.predict([test_data_2, test_data_1], batch_size=8192,
preds /= 2
As you say, your output is a np array with probabilities. You can convert it to binary labels by doing for example (model.predict(X) > 0.5).astype(int)
Artificial neural networks are probablisitc classfiiers, so your output is absolutly fine. It´s just the probability to belong to your target label.
In addition one interesting fact is that 0.5 is maybe not the offet you want to use. It depends on, how important true-positives and false-positives are in your task. You can take a look at the ROC Curves to find the optimal offset.
You can try changing your activation function to softmax in your last layer or you can make your own softmax function and pass your output to that function. Here's an example for a custom softmax function
def softmax(x):
return np.exp(x) / np.sum(np.exp(x), axis=0)