Okay so here's my CNN (simple example from a tutorial) along with some arithmetic to get the total number of free parameters.
We've got a dataset of 28*28 grayscale image (MNIST).
First layer is a 2D convolution using 32 3x3 kernels. Dimensionality of the output is 26x26x32 (kernel stride length was 1 and we have 32 feature maps of 26x26). Running parameter count: 288
Second layer is 2x2 MaxPool with a 2x2. Dimensionality of the output is 13x13x32 but then we flatten so we got a vector of length 5408. No extra parameters here.
Third layer is Dense. A 5408x100 matrix. Dimensionality of the output is 100. Running Parameter count: 540988
Fourth layer is Dense also. A 100x10 matrix. Dimensionality of the output is 10. Running Parameter count: 541988
Then we're supposed to do stochastic gradient descent on a 541988 parameter space!
That feels like a ridiculously big number to me. And this is meant to be the hello world problem of CNNs. Am I missing something fundamental in my understanding of how this is meant to work? Or maybe the number is correct but it's not actually a big deal for a computer to crunch?
In case it helps. Here is how the model was built in Keras:
def define_model():
model = Sequential()
model.add(Conv2D(32, (3,3), activation = 'relu', kernel_initializer = 'he_uniform', input_shape=(28,28,1)))
model.add(MaxPooling2D((2,2)))
model.add(Flatten())
model.add(Dense(100, activation='relu', kernel_initializer='he_uniform'))
model.add(Dense(10, activation='softmax'))
opt = SGD(lr=0.01, momentum=0.9)
model.compile(optimizer=opt, loss='categorical_crossentropy', metric=['accuracy'])
return model
Related
I am working with a Sequential Keras model and I trying to figure out the best method for feature scaling.
model = Sequential()
model.add(Masking(mask_value=-50, input_shape=(None,10)))
model.add(LayerNormalization(axis=-1))
model.add(LSTM(100, input_shape=(None,10)))
model.add(Dense(100, activation='relu'))
model.add(Dense(3, activation='softmax'))
print(model.summary())
In line 3, I have a LayerNormalization layer which according to documentation, scales to mean and standard deviation. However, I have also come across Batch normalization and tf.keras.layers.experimental.preprocessing.Normalization. My question is is this method similar to Sklearn's StandardScalar() or is there another method I could use to feature scale within the model?
This should work. It uses an UpSampling layer for a naive 5x5 image-based input:
# define model
model = Sequential()
# define input shape, output enough activations for for 128 5x5 image
model.add(Dense(128 * 5 * 5, input_dim=100))
# reshape vector of activations into 128 feature maps with 5x5
model.add(Reshape((5, 5, 128)))
# double input from 128 5x5 to 1 10x10 feature map
model.add(UpSampling2D())
# fill in detail in the upsampled feature maps and output a single image
model.add(Conv2D(1, (3,3), padding='same'))
# summarize model
model.summary()
But you can use the Conv2DTranspose layer too, which combines the UpSampling2D and Conv2D layers into one layer.
A TimeDistributed layer in the case of LSTMs will help. Refer
I'm trying to build a model that predict the price of a certain commodity based on current market conditions, my data are shaped similar to
num_samples = 100
sample_dimension = 10
XXX = np.random.random((num_samples,sample_dimension)).reshape(-1,1,sample_dimension)
YYY = np.random.random(num_samples).reshape(-1,1)
so I've got 100 ordered samples of X data, each consisting of 10 variables. My model looks like the following
model = keras.Sequential()
model.add(tf.keras.layers.Conv1D(4,
kernel_size = (2),
activation='sigmoid',
input_shape=(None, sample_dimension),
batch_input_shape = [1,1,sample_dimension]))
model.add(tf.keras.layers.AveragePooling1D(pool_size=2))
model.add(tf.keras.layers.Reshape((1, sample_dimension)))
model.add(tf.keras.layers.LSTM(100,
stateful = True,
return_sequences=False,
activation='sigmoid'))
model.add(keras.layers.Dense(1))
model.compile(optimizer='adam',
loss='mean_squared_error',
metrics=['accuracy'])
so it's a 1D convolution, a pooling, a reshape (so it plays nice with the lstm) and then casting down to a prediction
but when I try to run it, I get the following error
Negative dimension size caused by subtracting 2 from 1 for 'conv1d/conv1d' (op: 'Conv2D') with input shapes: [1,1,1,10], [1,2,10,4].
I've tried a few different values for the kernel size, pool size, and batch_input_shape (have to batch my inputs because my actual data are spread across several large files, so I want to read one at a time and kick it into training the model), but nothing seems to work.
What am I doing wrong? How can I track/predict the shape of my data as it goes through this model? What are the data/variables supposed to look like?
I ended up looking through tutorials for conv2D, and then converting stuff to conv1D (please edit as you feel appropriate)
conv2D solution
model = keras.Sequential()
model.add(tf.keras.layers.Conv2D(4,
kernel_size = (**1**,2),
activation = 'sigmoid',
input_shape = (**1**,sample_dimension,1),
batch_input_shape = [None,**1**,sample_dimension,1]))
model.add(tf.keras.layers.AveragePooling2D(pool_size=(1,2)))
#model.add(tf.keras.layers.Reshape((1,sample_dimension)))
model.add(tf.keras.layers.Flatten())
model.add(keras.layers.Dense(1))
Then I converted it to conv1D by taking out a dimension from each of the necessary arguments (the bold 1s)
model = keras.Sequential()
model.add(tf.keras.layers.Conv1D(4,
kernel_size = 2,
activation = 'sigmoid',
input_shape = (sample_dimension,1),
batch_input_shape = [None,sample_dimension,1]))
model.add(tf.keras.layers.AveragePooling1D(pool_size=2))
#model.add(tf.keras.layers.Reshape((1,sample_dimension)))
model.add(tf.keras.layers.Flatten())
model.add(keras.layers.Dense(1))
i guess the key takeaway is that tensorflow isn't designed to deal with vectors or even matrices, so the last dimension has to be the dimension of the tensor- in this case, it's a 1D tensor (just a number) being held in a sample_dimension
This is the model I am trying to replicate (more information in linked paper):
In our models, we adopted one dropout layer between LSTM models and the first fully-connected layer and another dropout layer between the first fully-connected layer and the second fully-connected layer. Their masking probabilities are both set to 0.5.
...
For our proposed CBLSTM, one-layer CNN is firstly designed, whose filter number, filter size and pooling size are set to 150, 10 and 5. Therefore, the shape of the raw sensory sequence is changed from 100 x 12 to 19 x 150 after CNN. Then, a two-layer bi-directional LSTM is built on top of the CNN.
Backward and forward LSTMs share the same layer sizes as [150, 200]. Therefore, the output of the LSTM module is the concatenated vector of the representations learned by backward and forward LSTMs, and its dimensionality is 400. Then, before feeding the representation into the linear regression layer, two fully-connected layers with a size of [500, 600] are adopted. The nonlinearity activation functions in our proposed CBLSTM are all set to ReLu.
Source: Zhao, R., Yan, R., Wang, J., & Mao, K. (2017). Learning to monitor machine health with convolutional bi-directional LSTM networks. Sensors, 17(2), 273. link to paper
The input is 630 samples x 100 timesteps x 12 features.
How my model looks at the moment:
model = Sequential()
model.add(Conv1D(filters=150, kernel_size=10, activation='relu', input_shape=(100,12)))
model.add(MaxPooling1D(pool_size=5, strides=None, padding='valid'))
model.add(Bidirectional(LSTM(150, return_sequences=True), merge_mode='concat'))
model.add(Bidirectional(LSTM(200, return_sequences=False), merge_mode='concat'))
model.add(Dropout(0.5))
model.add(Dense(500, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(600, activation='relu'))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='rmsprop', metrics=['mae'])
While the training loss steadily decreases per epoch, the validation set does not and diverges pretty quickly. This indicates that there is a mistake in my model which I have not yet been able to find. Any ideas as to what is wrong?
Side note: I am using the same data as input as the authors.
I'm creating a model to classify if the input waverform contains rising edge of SDA of I2C line.
My input has 20000 datapoints and 100 training data.
I've initially found an answer regarding the input in here Keras 1D CNN: How to specify dimension correctly?
However, I'm getting an error in the activation function:
ValueError: Error when checking target: expected activation_1 to have 3 dimensions, but got array with shape (100, 1)
My model is:
model.add(Conv1D(filters=n_filter,
kernel_size=input_filter_length,
strides=1,
activation='relu',
input_shape=(20000,1)))
model.add(BatchNormalization())
model.add(MaxPooling1D(pool_size=4, strides=None))
model.add(Dense(1))
model.add(Activation("sigmoid"))
adam = Adam(lr=learning_rate)
model.compile(optimizer= adam, loss='binary_crossentropy', metrics=['accuracy'])
model.fit(train_data, train_label,
nb_epoch=10,
batch_size=batch_size, shuffle=True)
score = np.asarray(model.evaluate(test_new_data, test_label, batch_size=batch_size))*100.0
I can't determine the problem in here. On why the activation function expects a 3D tensor.
The problem lies in the fact that starting from keras 2.0, a Dense layer applied to a sequence will apply the layer to each time step - so given a sequence it will produce a sequence. So your Dense is actually producing a sequence of 1-element vectors and this causes your problem (as your target is not a sequence).
There are several ways on how to reduce a sequence to a vector and then apply a Dense to it:
GlobalPooling:
You may use GlobalPooling layers like GlobalAveragePooling1D or GlobalMaxPooling1D, eg.:
model.add(Conv1D(filters=n_filter,
kernel_size=input_filter_length,
strides=1,
activation='relu',
input_shape=(20000,1)))
model.add(BatchNormalization())
model.add(GlobalMaxPooling1D(pool_size=4, strides=None))
model.add(Dense(1))
model.add(Activation("sigmoid"))
Flattening:
You might colapse the whole sequence to a single vector using Flatten layer:
model.add(Conv1D(filters=n_filter,
kernel_size=input_filter_length,
strides=1,
activation='relu',
input_shape=(20000,1)))
model.add(BatchNormalization())
model.add(MaxPooling1D(pool_size=4, strides=None))
model.add(Flatten())
model.add(Dense(1))
model.add(Activation("sigmoid"))
RNN Postprocessing:
You could also add a recurrent layer on a top of your sequence and make it to return only the last output:
model.add(Conv1D(filters=n_filter,
kernel_size=input_filter_length,
strides=1,
activation='relu',
input_shape=(20000,1)))
model.add(BatchNormalization())
model.add(MaxPooling1D(pool_size=4, strides=None))
model.add(SimpleRNN(10, return_sequences=False))
model.add(Dense(1))
model.add(Activation("sigmoid"))
Conv1D has its output with 3 dimensions (and it will keep like that until the Dense layer).
Conv output: (BatchSize, Length, Filters)
For the Dense layer to output only one result, you need to add a Flatten() or Reshape((shape)) layer, to make it (BatchSize, Lenght) only.
If you call model.summary(), you will see exactly what shape each layer is outputting. You have to adjust the output to be exactly the same shape as the array you pass as the correct results. The None that appears in those shapes is the batch size and may be ignored.
About your model: I think you need more convolution layers, reducing the number of filters gradually, because condensing so much data in a single Dense layer does not usually bring good results.
About dimensions: keras layers toturial and samples
Let's say that my training data is a 3d numpy array with dimensions (4155, 5, 150). This data consists of 4155 training samples, each one featuring a 5*150 matrix, while my labels are 4155. I then want to feed it to the following architecture :
model = Sequential()
model.add(Convolution1D(input_dim=4,
input_length=1000,
nb_filter=320,
filter_length=26,
border_mode="valid",
activation="relu",
subsample_length=1))
model.add(MaxPooling1D(pool_length=13, stride=13))
model.add(Dropout(0.2))
model.add(brnn)
model.add(Dropout(0.5))
model.add(Flatten())
model.add(Dense(input_dim=75*640, output_dim=925))
model.add(Activation('relu'))
model.add(Dense(input_dim=925, output_dim=919))
model.add(Activation('sigmoid'))
The problem is that a) I don't know how to change the dimensionality of my input array, so that it can fit to this model. The above specified layer parameters are just for the example. I simply want to use a Convolutional Layer, followed by a Bidirrectional LSTM and finally two Fully Connected Layers. Does anyone have an idea ?
Thanks in advance !