I am trying to use keras to do nonlinear regression. I have simulated 90000 datasets and labelled them with 2 parameters. My goal is to have a fully connected NN to estimate these two parameters after training. Currently, the model works well for fitting only one label. As a test, I have tried fitting each label independently and this works well, however when I want to fit both simultaneously it fails (i.e., the model predicts 1 label accurately, but not the other. In some instances the second label is off by a factor of 1000 and in other cases it simply reads [0.] ... depending on the activation for my output layer). 1 label is on the order of 1e7 and the other label varies between 0 and 1. I have tried normalizing both labels to lie between 0 and 1 - this didn't help. Each input should be a vector of size 1024 and associated with 2 labels.
Any help or literature suggestions on how to fit multi-labelled data would be much appreciated. Attached below is the code for my model. Thank you.
# Build The Model
model = Sequential()
# The Input Layer :
model.add(Dense(1024, kernel_initializer='normal', input_dim=1024, activation='relu'))
# The Hidden Layers :
model.add(Dense(1024, kernel_initializer='normal',activation='relu'))
model.add(Dense(1024, kernel_initializer='normal',activation='relu'))
model.add(Dense(1024, kernel_initializer='normal',activation='relu'))
# The Output Layer :
model.add(Dense(2, kernel_initializer='normal', activation='relu'))
# Compile the network :
model.compile(loss='MSE', optimizer='adam', metrics=['MSE'])
model.summary()
Related
The shape of the train/test data is (samples, 256, 256, 1). The training dataset has around 1400 samples, the validation dataset has 150 samples, and the test dataset has 250 samples. Then I build a CNN model for a six-object classification task. However, no matter how hard I tuning the parameters and add/remove layers(conv&dense), I get a chance level of accuracy all the time (around 16.5%). Thus, I would like to know whether I made some deadly mistakes while building the model. Or there is something wrong with the data itself, not the CNN model.
Code:
def build_cnn_model(input_shape, activation='relu'):
model = Sequential()
# 3 Convolution layer with Max polling
model.add(Conv2D(64, (5, 5), activation=activation, padding = 'same', input_shape=input_shape))
model.add(MaxPooling2D((2, 2)))
model.add(Conv2D(128, (5, 5), activation=activation, padding = 'same'))
model.add(MaxPooling2D((2, 2)))
model.add(Conv2D(256, (5, 5), activation=activation, padding = 'same'))
model.add(MaxPooling2D((2, 2)))
model.add(Flatten())
# 3 Full connected layer
model.add(Dense(1024, activation = activation))
model.add(Dropout(0.5))
model.add(Dense(512, activation = activation))
model.add(Dropout(0.5))
model.add(Dense(6, activation = 'softmax')) # 6 classes
# summarize the model
print(model.summary())
return model
def compile_and_fit_model(model, X_train, y_train, X_vali, y_vali, batch_size, n_epochs, LR=0.01):
# compile the model
model.compile(
optimizer=tf.keras.optimizers.Adam(learning_rate=LR),
loss='sparse_categorical_crossentropy',
metrics=['sparse_categorical_accuracy'])
# fit the model
history = model.fit(x=X_train,
y=y_train,
batch_size=batch_size,
epochs=n_epochs,
verbose=1,
validation_data=(X_vali, y_vali))
return model, history
I transformed the MEG data my professor recorded into Magnitude Scalogram using CWT. pywt.cwt(data, scales, wavelet) was used. And if I plot the coefficients I got from cwt, I will have a graph like this (I emerged 62 channels into one graph). enter image description here
I used the coefficients as train/test data for the CNN model. However, I tuned the parameters and tried to add/remove layers for the CNN model, and the classification accuracy was unchanged. Thus, I want to know where I made mistakes. Did I make mistakes with building the CNN model, or did I make mistakes with CWT (the way I handled data)?
Please give me some advices, thank you.
How is the accuracy of the training data? If you have a small dataset and the model does not overfit after training for a while, then something is wrong with the model. You can also test with existing datasets, which the model should be able to handle (like Fashion MNIST).
Testing if you handled the data correctly is harder. Did you write unit tests for the different steps in the preprocessing pipeline?
This is the model I am trying to replicate (more information in linked paper):
In our models, we adopted one dropout layer between LSTM models and the first fully-connected layer and another dropout layer between the first fully-connected layer and the second fully-connected layer. Their masking probabilities are both set to 0.5.
...
For our proposed CBLSTM, one-layer CNN is firstly designed, whose filter number, filter size and pooling size are set to 150, 10 and 5. Therefore, the shape of the raw sensory sequence is changed from 100 x 12 to 19 x 150 after CNN. Then, a two-layer bi-directional LSTM is built on top of the CNN.
Backward and forward LSTMs share the same layer sizes as [150, 200]. Therefore, the output of the LSTM module is the concatenated vector of the representations learned by backward and forward LSTMs, and its dimensionality is 400. Then, before feeding the representation into the linear regression layer, two fully-connected layers with a size of [500, 600] are adopted. The nonlinearity activation functions in our proposed CBLSTM are all set to ReLu.
Source: Zhao, R., Yan, R., Wang, J., & Mao, K. (2017). Learning to monitor machine health with convolutional bi-directional LSTM networks. Sensors, 17(2), 273. link to paper
The input is 630 samples x 100 timesteps x 12 features.
How my model looks at the moment:
model = Sequential()
model.add(Conv1D(filters=150, kernel_size=10, activation='relu', input_shape=(100,12)))
model.add(MaxPooling1D(pool_size=5, strides=None, padding='valid'))
model.add(Bidirectional(LSTM(150, return_sequences=True), merge_mode='concat'))
model.add(Bidirectional(LSTM(200, return_sequences=False), merge_mode='concat'))
model.add(Dropout(0.5))
model.add(Dense(500, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(600, activation='relu'))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='rmsprop', metrics=['mae'])
While the training loss steadily decreases per epoch, the validation set does not and diverges pretty quickly. This indicates that there is a mistake in my model which I have not yet been able to find. Any ideas as to what is wrong?
Side note: I am using the same data as input as the authors.
To solve a problem I developed some deep learning models such as MLP (6 Dense layers), CNN (1 Conv1D + 1 Dense layers), LSTM (1 LSTM + 2 Dense layers) and constructed their lose and accuracy chart.
CNN model
model = Sequential()
model.add(Embedding(len(vectorizer.get_feature_names()) + 1,
64, # Embedding size
input_length=MAX_SEQ_LENGHT))
model.add(Conv1D(64, 5, activation='relu'))
model.add(MaxPooling1D(5))
model.add(Flatten())
model.add(Dense(64))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(units=1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
print(model.summary())
I have these questions:
1-Which epoche is suitable for each one?
Chart 1 => 16
Chart 2 => 15
Chart 3 => 5
2-Are these charts show overfit? (especially chart 1. what wrong with it? how it can possible the accuracy of the test is more than train!?)
3-is it ok when training loss is higher than test loss (in chart 1)?! should I increase the epoch?
Chart 1:
Chart 2:
Chart 3:
Your model seemed to be overfit in chart-1.If you can't increase the dataset(directly or by data augmentation technique) then chart-3 seemed to be the best model among the three.
I am trying to create an neural network based on the iris dataset. I have an input of four dimensions. X = dataset[:,0:4].astype(float). Then, I create a neural network with four nodes.
model = Sequential()
model.add(Dense(4, input_dim=4, init='normal', activation='relu'))
model.add(Dense(3, init='normal', activation='sigmoid'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
As I understand, I pass each dimension to the separate node. Four dimensions - four nodes. When I create a neural network with 8 input nodes, how does it work? Performance still is the same as with 4 nodes.
model = Sequential()
model.add(Dense(8, input_dim=4, init='normal', activation='relu'))
model.add(Dense(3, init='normal', activation='sigmoid'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
You have an error on your last activation. Use softmax instead of sigmoid and run again.
replace
model.add(Dense(3, init='normal', activation='sigmoid'))
with
model.add(Dense(3, init='normal', activation='softmax'))
To answer your main question of "How does this work?":
From a conceptual standpoint, you are initially creating a fully-connected, or Dense, neural network with 3 layers: an input layer with 4 nodes, a hidden layer with 4 nodes, and an output layer with 3 nodes. Each node in the input layer has a connection to every node in the hidden layer, and same with the hidden to the output layer.
In your second example, you just increased the number of nodes in the hidden layer from 4 to 8. A larger network can be good, as it can be trained to "look" for more things in your data. But too large of a layer and you may overfit; this means the network remembers too much of the training data, when it really just needs a general idea of the training data so it can still recognize slightly different data, which is your testing data.
The reason you may not have seen an increase in performance is likely either overfitting or your activation function; Try a function other than relu in your hidden layer. After trying a few different function combinations, if you don't see any improvement, you are likely overfitting.
Hope this helps.
Let's say that my training data is a 3d numpy array with dimensions (4155, 5, 150). This data consists of 4155 training samples, each one featuring a 5*150 matrix, while my labels are 4155. I then want to feed it to the following architecture :
model = Sequential()
model.add(Convolution1D(input_dim=4,
input_length=1000,
nb_filter=320,
filter_length=26,
border_mode="valid",
activation="relu",
subsample_length=1))
model.add(MaxPooling1D(pool_length=13, stride=13))
model.add(Dropout(0.2))
model.add(brnn)
model.add(Dropout(0.5))
model.add(Flatten())
model.add(Dense(input_dim=75*640, output_dim=925))
model.add(Activation('relu'))
model.add(Dense(input_dim=925, output_dim=919))
model.add(Activation('sigmoid'))
The problem is that a) I don't know how to change the dimensionality of my input array, so that it can fit to this model. The above specified layer parameters are just for the example. I simply want to use a Convolutional Layer, followed by a Bidirrectional LSTM and finally two Fully Connected Layers. Does anyone have an idea ?
Thanks in advance !