I have two datasets, both have a lot of rows, for each row I have in the first dataset 10 columns (extracted features) in the second dataset I have 18 columns(features extracted).
Now I would like to train a Recurrent Neural Network (in Keras) with both datasets but having different input_size (the columns). This is my code:
time_steps = 1 # the height of the image
input_size = 10 # extracted features dataset 1
BATCH_SIZE = 1
num_class = 7
model = Sequential()
# RNN cell
model.add(LSTM(batch_input_shape=(BATCH_SIZE, time_steps, input_size),
units=n_hidden_units, recurrent_dropout=0.3,
activation='tanh', recurrent_activation='hard_sigmoid'))
# output layer
model.add(Dense(num_class))
model.add(Activation('softmax'))
# optimizer
# adam = Adam(LR)
# compile model
model.compile(optimizer='nadam',
loss='categorical_crossentropy',
metrics=['accuracy'])
model.fit(data, target, epochs=25, batch_size=1, verbose=False)
prediction = model.predict_classes(data_test, batch_size=1)
The above code is only for the first dataset and how you can see the input_size=10 is equal to the number of features for dataset one.
My question is how can I do, if I would like to train a Recurrent like above where the input_size become variable?
Pad all the sequences to the same length before training. Use zero padding.
Related
I have trained a vgg16 model with a total of 1000 images for 5 classes (200 images for each class). I have used data augmentation, stratified K-fold, and dropout to train the model. The train accuracy and val accuracy is good. However, when i do prediction on the trained model with test dataset, the result of confusion matrix is not compatible with the train accuracy.
[Train & Val accuracy[Classification reportConfusion Matrix](https://i.stack.imgur.com/OIX3O.png)](https://i.stack.imgur.com/MAPXC.png)
VGG model:
def create_model():
# (CNN) is a multilayered neural network with a special architecture to detect complex features in data.
# VGG16 = Visual Geometry Group
# 16 = 16 refers to it has 16 layers that have weights
# VGG16 have 128 million parameters
# 3x3 filter with stride 1
# maxpool layer of 2x2 filter of stride 2
# Conv-1 Layer has 64 number of filters
# Conv-2 has 128 filters
# Conv-3 has 256 filters
# Conv 4 and Conv 5 has 512 filters
# import library
from tensorflow.keras.models import Model
from tensorflow.keras.applications import VGG16
from tensorflow.keras.layers import Dense, GlobalAveragePooling2D, Conv2D
# number of species
NO_CLASSES = 5
# load the VGG16 model as the base model for training
# exclude the fully connected layer
base_model = VGG16(include_top=False, input_shape=(224, 224, 3))
# add layers
x = base_model.output
x = Conv2D(64, (3,3), activation = 'relu')(x) # output layer will be 64x64, (3x3) kernel
x = GlobalAveragePooling2D()(x) # use average pooling as we dont have min pooling which selects the darkest pixels from image (our dataset here is white background)
# add dense layers so that the model can learn more complex functions and classify for netter results
# Dense layer = a layer that is deeply connected with its preceding layer
x = Flatten()(x) # for feeding into fully connected layer as fully connected layer only accept 1D
x = Dense(1024,activation='relu')(x)
x = Dense(1024,activation='relu')(x) # dense layer 2
x = Dense(512,activation='relu')(x) # dense layer 3
x = Dropout(0.2)(x) # reduce dependency between neurons
# final layer with softmax activation for multiclass classification
preds = Dense(NO_CLASSES, activation='softmax')(x)
# layers of the VGG16 model are frozen, bcuz we dont want their weights to changes during model training
# create a new model with the base model's original input and the new model's output
model = Model(inputs = base_model.input, outputs = preds)
# don't train the first 19 layers - 0..18
for layer in model.layers[:19]:
layer.trainable=False
# train the rest of the layers - 19 onwards
for layer in model.layers[19:]:
layer.trainable=True
# compile the model
model.compile(optimizer='Adam', # Adam optimizer -- training cost (low) and performance (high)
loss='categorical_crossentropy', # for multi-class(classes are mutually exclusive) problem
metrics=['accuracy']) # calculate accuracy
return model
Stratified K fold & model fit
from sklearn.model_selection import StratifiedKFold
from statistics import mean, stdev
EPOCHS = 6
histories = []
kfold = StratifiedKFold(n_splits = 5, shuffle=True, random_state=123)
for f, (trn_ind, val_ind) in enumerate(kfold.split(train_dataset.Image_path, train_dataset.labels)):
print(); print("#"*50)
print("Fold: ",f+1)
print("#"*50)
train_ds = datagen.flow_from_dataframe(train_dataset.loc[trn_ind,:],
x_col='Image_path', y_col='labels',
target_size=(width,height),
class_mode = 'categorical', color_mode = 'rgb',
batch_size = 16, shuffle = True)
val_ds = datagen.flow_from_dataframe(train_dataset.loc[val_ind,:],
x_col='Image_path', y_col='labels',
target_size=(width,height),
class_mode = 'categorical', color_mode = 'rgb',
batch_size = 16, shuffle = True)
# Define start and end epoch for each folds
fold_start_epoch = f * EPOCHS
fold_end_epoch = EPOCHS * (f+1)
step_size_train = train_ds.n // train_ds.batch_size
# fit
history=model.fit(train_ds,
initial_epoch=fold_start_epoch ,
epochs=fold_end_epoch,
validation_data=val_ds,
shuffle=True,
steps_per_epoch=step_size_train,
verbose=1)
# store history for each folds
histories.append(history)
Does this happened is because of the dataset itself or the coding problem? I hope to find the mistake.
We have used vgg16 and freeze top layers and retrain the last 4 layers on gender dataset 12k male and 12k female. It gives very low accuracy especially for male. We are using the IMDB dataset. On female test data it gives female as output but on male it gives same output.
vgg_conv=VGG16(weights='imagenet', include_top=False, input_shape=(224, 224, 3))
Freeze the layers except the last 4 layers
for layer in vgg_conv.layers[:-4]:
layer.trainable = False
Create the model
model = models.Sequential()
Add the vgg convolutional base model
model.add(vgg_conv)
Add new layers
model.add(layers.Flatten())
model.add(layers.Dense(4096, activation='relu'))
model.add(layers.Dense(4096, activation='relu'))
model.add(layers.Dropout(0.5)) model.add(layers.Dense(2, activation='softmax'))
nTrain=16850 nTest=6667
train_datagen = image.ImageDataGenerator(rescale=1./255)
test_datagen = image.ImageDataGenerator(rescale=1./255)
batch_size = 12 batch_size1 = 12
train_generator = train_datagen.flow_from_directory(train_dir, target_size=(224, 224), batch_size=batch_size, class_mode='categorical', shuffle=False)
test_generator = test_datagen.flow_from_directory(test_dir, target_size=(224, 224), batch_size=batch_size1, class_mode='categorical', shuffle=False)
model.compile(optimizer=optimizers.RMSprop(lr=1e-6), loss='categorical_crossentropy', metrics=['acc'])
history = model.fit_generator( train_generator, steps_per_epoch=train_generator.samples/train_generator.batch_size, epochs=3, validation_data=test_generator, validation_steps=test_generator.samples/test_generator.batch_size, verbose=1)
model.save('gender.h5')
Testing Code:
model=load_model('age.h5')
img=load_img('9358807_1980-12-28_2010.jpg', target_size=(224,224))
img=img_to_array(img)
img=img.reshape((1,img.shape[0],img.shape[1],img.shape[2]))
img=preprocess_input(img)
yhat=model.predict(img)
print(yhat.size)
label=decode_predictions(yhat)
label=label[0][0]
print('%s(%.2f%%)'% (label[1],label[2]*100))
Firstly, you are saving the model as gender.h5 and during testing you are loading the model age.h5. Probably you have added different code for the testing here.
Coming to improving the accuracy of the program -
Most importantly is that you are using loss = 'categorical_crossentropy', change it to loss = 'binary_crossentropy' in model.compile as you have just 2 classes. So your
model.compile(optimizer="adam",loss=tf.keras.losses.BinaryCrossentropy(from_logits=True), metrics=['accuracy']) will look like this.
Also change class_mode='categorical' to class_mode='binary' in flow_from_directory.
As categorical_crossentropy goes hand in hand with softmax activation in the last layer, and if you change the loss to binary_crossentropy the last activation should also be changed to sigmoid. So last layer should be Dense(1, activation='sigmoid').
You have added 2 Dense layers of 4096, this will add 4096 * 4096 = 16,777,216 weights to be learnt by the model. Reduce them may be to 1026 and 512 respectively.
You have added Dropout layer of 0.5, that is to keep the 50% of neurons off during the epoch. That is huge number. Better is to drop off the Dropout layer and use to only if your model is overfitting.
Set batch_size = 1. As you have very less input let every epoch have same number of steps as input records.
Use Data Augmentation technique like horizontal_flip, vertical_flip, shear_range, zoom_range of ImageDataGenerator to generate the new batches of training and validation images during every epoch.
Train your model for large number of epoch. You are just training for epoch=3, that is too less for learning the weights. Train for epoch=50 and later trim the number.
Hope this answers your question. Happy Learning.
I am trying to set up a neural network for identify Elliott Waves and I was wondering if it is possible to pass in an array of arrays into a perceptron? My plan is to pass in an array of size 4 ([Open, Close, High, Low]) into each perceptron. If so, how would the weighted average calculation work and how can I go about this using the Python Keras library? Thanks!
This is a pretty standard Fully Connected Neural Network to build. I assume that you have a classification problem:
from keras.layers import Input, Dense
from keras.models import Model
# I assume that x is the array containing the training data
# the shape of x should be (num_samples, 4)
# The array containing the test data is named y and is
# one-hot encoded with a shape of (num_samples, num_classes)
# num_samples is the number of samples in your training set
# num_classes is the number of classes you have
# e.g. is a binary classification problem num_classes=2
# First, we'll define the architecture of the network
inp = Input(shape=(4,)) # you have 4 features
hidden = Dense(10, activation='sigmoid')(inp) # 10 neurons in your hidden layer
out = Dense(num_classes, activation='softmax')(hidden)
# Create the model
model = Model(inputs=[inp], outputs=[out])
# Compile the model and define the loss function and optimizer
model.compile(loss='categorical_crossentropy', optimizer='adam',
metrics=['accuracy'])
# feel free to change these to suit your needs
# Train the model
model.fit(x, y, epochs=10, batch_size=512)
# train the model for 10 epochs with a batch size of 512
From the official example in Keras docs, the stacked LSTM classifier is trained using categorical_crossentropy as a loss function, as expected. https://keras.io/getting-started/sequential-model-guide/#examples
But the y_train values are seeded using numpy.random.random() which outputs real numbers, versus 0,1 binary classification ( which is typical )
Are the y_train values being promoted to 0,1 values under the hood?
Can you even train this loss function against real values between 0,1 ?
How is accuracy then calculated ?
Confusing.. no?
from keras.models import Sequential
from keras.layers import LSTM, Dense
import numpy as np
data_dim = 16
timesteps = 8
num_classes = 10
# expected input data shape: (batch_size, timesteps, data_dim)
model = Sequential()
model.add(LSTM(32, return_sequences=True,
input_shape=(timesteps, data_dim))) # returns a sequence of vectors of dimension 32
model.add(LSTM(32, return_sequences=True)) # returns a sequence of vectors of dimension 32
model.add(LSTM(32)) # return a single vector of dimension 32
model.add(Dense(10, activation='softmax'))
model.compile(loss='categorical_crossentropy',
optimizer='rmsprop',
metrics=['accuracy'])
# Generate dummy training data
x_train = np.random.random((1000, timesteps, data_dim))
y_train = np.random.random((1000, num_classes))
# Generate dummy validation data
x_val = np.random.random((100, timesteps, data_dim))
y_val = np.random.random((100, num_classes))
model.fit(x_train, y_train,
batch_size=64, epochs=5,
validation_data=(x_val, y_val))
For this example, the y_train and y_test are not the one-hot encoding anymore, but the probabilities of each classes. So it is still applicable for cross-entropy. And we can treat the one-hot encoding as the special case of the probabilities vector.
y_train[0]
array([0.30172708, 0.69581121, 0.23264601, 0.87881279, 0.46294832,
0.5876406 , 0.16881395, 0.38856604, 0.00193709, 0.80681196])
I am trying to build an LSTM Autoencoder to predict Time Series data. Since I am new to Python I have mistakes in the decoding part. I tried to build it up like here and Keras. I could not understand the difference between the given examples at all. The code that I have right now looks like:
Question 1: is how to choose the batch_size and input_dimension when each sample has 2000 values?
Question 2: How to get this LSTM Autoencoder working (the model and the prediction) ? This ist just the model, but how to predict? That it is predicting from the lets say starting from sample 10 on till the end of the data?
Mydata has in total 1500 samples, I would go with 10 time steps (or more if better), and each sample has 2000 Values. If you need more information I would include them as well later.
trainX = np.reshape(data, (1500, 10,2000))
from keras.layers import *
from keras.models import Model
from keras.layers import Input, LSTM, RepeatVector
parameter
timesteps=10
input_dim=2000
units=100 #choosen unit number randomly
batch_size=2000
epochs=20
Model
inpE = Input((timesteps,input_dim))
outE = LSTM(units = units, return_sequences=False)(inpE)
encoder = Model(inpE,outE)
inpD = RepeatVector(timesteps)(outE)
outD1 = LSTM(input_dim, return_sequences=True)(outD
decoder = Model(inpD,outD)
autoencoder = Model(inpE, outD)
autoencoder.compile(loss='mean_squared_error',
optimizer='rmsprop',
metrics=['accuracy'])
autoencoder.fit(trainX, trainX,
batch_size=batch_size,
epochs=epochs)
encoderPredictions = encoder.predict(trainX)
The LSTM model that I use is this one:
def get_model(n_dimensions):
inputs = Input(shape=(timesteps, input_dim))
encoded = LSTM(n_dimensions, return_sequences=False, name="encoder")(inputs)
decoded = RepeatVector(timesteps)(encoded)
decoded = LSTM(input_dim, return_sequences=True, name='decoder')(decoded)
autoencoder = Model(inputs, decoded)
encoder = Model(inputs, encoded)
return autoencoder, encoder
autoencoder, encoder = get_model(n_dimensions)
autoencoder.compile(optimizer='rmsprop', loss='mse',
metrics=['acc', 'cosine_proximity'])
history = autoencoder.fit(x, x, batch_size=100, epochs=100)
encoded = encoder.predict(x)
It works with the data that have, x is of size (3000, 180, 40), that is 3000 samples, timesteps=180 and input_dim=40.