I Have an auto encoder like neural network that i coded and 9000 training examples. I also have a numpy ndarray with 9000 arrays.
My goal is to tie the activations (not weights) of the middle layer (which is reshaped to the size of one of the arrays in the 9000 group).
I would like a situation where i randomly innitialise the numpy array. Then , using a batch size of 9000..average the values between the particular training examples activation in the middle layer , and its associated array out of the 9000 in the numpy array.
The numpy ndarray is to be used elsewhere in the program after training.
So this is what should happen should we have 1 training example, then we may have the numpy array be a (1,240,5) and the representation (activations) in the middle layer of the neural network after forward propagating become reshaped to (1,240,5).
so there is a neuron for each entry in the numpy ndarray. Given that after the first forward pass , the activations will be reshaped to an ndarray the same shape as the numpy array, i would like to average the values of those found already within the numpy array , and those that appeared after the first foward pass.
as training countinues on that one training example, the activations will change, and i would like to keep averaging the values at each iteration, untill training is complete.
this is more complex in the example with 9000 training examples where each example after the first pass has a seperate activation ndarray and must be averaged with its own external numpy array.
Again, the external numpy array is not a part of the neural network although its final values will depend on the training procedure.
i am using keras.
so the code could be something like this perhaps, using model.metrics_tensors and a minibatch of 1.
a=np.random.rand(9000, 240, 5)
Mirror=(Dense(240*5))
model = Sequential()
model.add(Conv2D(64, (3, 3), activation='relu', padding='same'))
model.add(MaxPooling2D(pool_size=(2, 2))) #7 x 7 x 64
model.add(Conv2D(64, (3, 3), activation='relu', padding='same'))
model.add(Flatten())
model.add(Mirror)
#decoder
model.add(Conv2D(128, (3, 3), activation='relu', padding='same'))
model.add(UpSampling2D((2,2))) # 14 x 14 x 128
model.add(Conv2D(64, (3, 3), activation='relu', padding='same'))
model.add(UpSampling2D((2,2))) # 28 x 28 x 64
model.add(Conv2D(64, (3, 3), activation='relu', padding='same'))
model.compile(optimizer="adadelta", loss='mse',metrics=["accuracy"])
output_layers= ['Mirror']
model.metrics_names += 'Mirror'
model.metrics_tensors = [Mirror]
x=[layer.output for layer in model.layers if layer.name in output_layers]
del x[0]
x=np.asarray(x)
x=np.flatten(x)
x=np.ndarray.reshape(x,(1,240,5))
tf.keras.layers.average(x,a[0:1,:,:])
Related
:)
I have a Datset of ~16,000 .wav recording from 70 bird species.
I'm training a model using tensorflow to classify the mel-spectrogram of these recordings using Convolution based architectures.
One of the architectures used is simple multi-layer convolutional described below.
The pre-processing phase include:
extract mel-spectrograms and convert to dB Scale
segment audio to 1-second segment (pad with zero Or gaussian noise if residual is longer than 250ms, discard otherwise)
z-score normalization of training data - reduce mean and divide result by std
pre-processing while inference:
same as described above
z-score normalization BY training data - reduce mean (of training) and divide result by std (of training data)
I understand that the output layer's probabilities with sigmoid activation is not suppose to accumulate to 1, But I get many (8-10) very high prediction (~0.999) probabilities. and some is exactly 0.5.
The current test set correct classification rate is ~84%, tested with 10-fold cross validation, So it seems that the the network mostly operates well.
notes:
1.I understand there are similar features in the vocalization of different birds species, but the recieved probabilities doesn't seem to reflect them correctly
2. probabilities for example - a recording of natural noise:
Natural noise: 0.999
Mallard - 0.981
I'm trying to understand the reason for these results, if it's related the the data etc extensive mislabeling (probably not) or from another source.
Any help will be much appreciated! :)
EDIT: I use sigmoid because the probabilities of all classes are necessary, and I don't need them to accumulate to 1.
def convnet1(input_shape, numClasses, activation='softmax'):
# Define the network
model = tf.keras.Sequential()
model.add(InputLayer(input_shape=input_shape))
# model.add(Augmentations1(p=0.5, freq_type='mel', max_aug=2))
model.add(Conv2D(64, (3, 3), activation='relu', padding='same'))
model.add(BatchNormalization())
model.add(MaxPooling2D(pool_size=(2, 1)))
model.add(Conv2D(128, (3, 3), activation='relu', padding='same'))
model.add(BatchNormalization())
model.add(MaxPooling2D(pool_size=(2, 1)))
model.add(Conv2D(128, (5, 5), activation='relu', padding='same'))
model.add(BatchNormalization())
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(256, (5, 5), activation='relu', padding='same'))
model.add(BatchNormalization())
model.add(Flatten())
# model.add(Dense(numClasses, activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(numClasses, activation='sigmoid'))
model.compile(
loss='categorical_crossentropy',
metrics=['accuracy'],
optimizer=optimizers.Adam(learning_rate=0.001),
run_eagerly=False) # this parameter allows to debug and use regular functions inside layers: print(), save() etc..
return model
For future searches - this problem was solved, and the reason was found(!).
The initial batch size that was used was 256 or 512. reducing the batch size to 16 or 32 SOLVED THE PROBLEM, and now the difference in probabilities are as expected for training AND test set samples - very high for the correct label and very low for other classes.
I am getting confused with the filter paramater, which is the first parameter in the Conv2D() layer function in keras. As I understand the filters are supposed to do things like edge detection or sharpening the image or blurring the image, but when I am defining the model as
input_shape = (32, 32, 3)
model = Sequential()
model.add( Conv2D(64, kernel_size=(5, 5), activation='relu', input_shape=input_shape, strides=(1,1), padding='same') )
model.add(MaxPooling2D(pool_size=(2, 2), strides=(2,2)))
model.add(Conv2D(64, kernel_size=(5, 5), activation='relu', input_shape=input_shape, strides=(1,1), padding='same'))
model.add(MaxPooling2D(pool_size=(2, 2), strides=(2,2)))
model.add(Conv2D(128, kernel_size=(5, 5), activation='relu', input_shape=input_shape, strides=(1,1), padding='same'))
model.add(Flatten())
model.add(Dense(3072, activation='relu'))
model.add(Dense(2048, activation='relu'))
model.add(Dense(num_classes, activation='softmax'))
I am not mentioning the the edge detection or blurring or sharpening anywhere in the Conv2D function. The input images are 32 by 32 RGB images.
So my question is, when I define the Convolution layer as Conv2D(64, ...), does this 64 means 64 different types of filters, such as vertical edge, horizontal edge, etc, which are chosen by keras at random? if so then is the output of the convolution layer (with 64 filters and 5x5 kernel and 1x1 stride) on a 32x32 1-channel image is 64 images of 28x28 size each. How are these 64 images combined to form a single image for further layers?
The filters argument sets the number of convolutional filters in that layer. These filters are initialized to small, random values, using the method specified by the kernel_initializer argument. During network training, the filters are updated in a way that minimizes the loss. So over the course of training, the filters will learn to detect certain features, like edges and textures, and they might become something like the image below (from here).
It is very important to realize that one does not hand-craft filters. These are learned automatically during training -- that's the beauty of deep learning.
I would highly recommend going through some deep learning resources, particularly https://cs231n.github.io/convolutional-networks/ and https://www.youtube.com/watch?v=r5nXYc2wYvI&list=PLypiXJdtIca5sxV7aE3-PS9fYX3vUdIOX&index=3&t=3122s.
Just wanted to clarify what the output shape was.
Although jakub's answer was good, I don't think it addressed the "single image for further layers" part of the question.
I did a model.summary() to find out more.
I found that the shape returned from a Conv2D is (None, img_width, img_height, num_filters)
So when you pass the output of the Conv2D to MaxPooling you are passing that shape which means it is basically passing each entire convoluted image.
The other layers handle this gracefully. MaxPooling2D(2,2) returns the same shape but half the image size (None, img_width / 2, img_height / 2, num_filters).
Side note: I wish the filters was named num_filters because filters seems to imply you're passing in a list of filters in which to convolute the image.
I am trying to combine CNN and LSTM for image classification.
I tried the following code and I am getting an error. I have 4 classes on which I want to train and test.
Following is the code:
from keras.models import Sequential
from keras.layers import LSTM,Conv2D,MaxPooling2D,Dense,Dropout,Input,Bidirectional,Softmax,TimeDistributed
input_shape = (200,300,3)
Model = Sequential()
Model.add(TimeDistributed(Conv2D(
filters=16, kernel_size=(12, 16), activation='relu', input_shape=input_shape)))
Model.add(TimeDistributed(MaxPooling2D(pool_size=(2, 2),strides=2)))
Model.add(TimeDistributed(Conv2D(
filters=24, kernel_size=(8, 12), activation='relu')))
Model.add(TimeDistributed(MaxPooling2D(pool_size=(2, 2),strides=2)))
Model.add(TimeDistributed(Conv2D(
filters=32, kernel_size=(5, 7), activation='relu')))
Model.add(TimeDistributed(MaxPooling2D(pool_size=(2, 2),strides=2)))
Model.add(Bidirectional(LSTM((10),return_sequences=True)))
Model.add(Dense(64,activation='relu'))
Model.add(Dropout(0.5))
Model.add(Softmax(4))
Model.compile(loss='sparse_categorical_crossentropy',optimizer='adam')
Model.build(input_shape)
I am getting the following error:
"Input tensor must be of rank 3, 4 or 5 but was {}.".format(n + 2))
ValueError: Input tensor must be of rank 3, 4 or 5 but was 2.
I found a lot of problems in the code:
your data are in 4D so simple Conv2D are ok, TimeDistributed is not needed
your output is 2D so set return_sequences=False in the last LSTM cell
your last layers are very messy: no need to put a dropout between a layer output and an activation
you need categorical_crossentropy and not sparse_categorical_crossentropy because your target is one-hot encoded
LSTM expects 3D data. So you need to pass from 4D (the output of convolutions) to 3D. There are two possibilities you can adopt: 1) make a reshape (batch_size, H, W * channel); 2) (batch_size, W, H * channel). In this way, u have 3D data to use inside your LSTM
here a full model example:
def ReshapeLayer(x):
shape = x.shape
# 1 possibility: H,W*channel
reshape = Reshape((shape[1],shape[2]*shape[3]))(x)
# 2 possibility: W,H*channel
# transpose = Permute((2,1,3))(x)
# reshape = Reshape((shape[1],shape[2]*shape[3]))(transpose)
return reshape
model = Sequential()
model.add(Conv2D(filters=16, kernel_size=(12, 16), activation='relu', input_shape=input_shape))
model.add(MaxPooling2D(pool_size=(2, 2),strides=2))
model.add(Conv2D(filters=24, kernel_size=(8, 12), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2),strides=2))
model.add(Conv2D(filters=32, kernel_size=(5, 7), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2),strides=2))
model.add(Lambda(ReshapeLayer)) # <========== pass from 4D to 3D
model.add(Bidirectional(LSTM(10, activation='relu', return_sequences=False)))
model.add(Dense(nclasses,activation='softmax'))
model.compile(loss='categorical_crossentropy',optimizer='adam')
model.summary()
here the running notebook
I am building a model for autoencoder. I have a dataset of images(256x256) in LAB color space.
But i dont know, what is the right maximum compression point. I found example, when i have 176 x 176 x 1 (~30976), then the point is 22 x 22 x 512 (~247808).
But how is that calculated?
My model:
model = Sequential()
model.add(Conv2D(32, (3, 3), activation='relu', padding='same', input_shape=(256, 256, 1)))
model.add(MaxPooling2D((2,2)))
model.add(Conv2D(64, (3, 3), activation='relu', padding='same'))
model.add(MaxPooling2D((2,2)))
model.add(Conv2D(128, (3,3), activation='relu', padding='same'))
model.add(MaxPooling2D((2,2)))
model.add(Conv2D(256, (3,3), activation='relu', padding='same'))
model.add(MaxPooling2D((2,2)))
model.add(Conv2D(512, (3,3), activation='relu', padding='same'))
#Decoder
model.add(Conv2D(256, (3,3), activation='relu', padding='same'))
model.add(UpSampling2D((2, 2)))
model.add(Conv2D(128, (3,3), activation='relu', padding='same'))
model.add(UpSampling2D((2, 2)))
model.add(Conv2D(64, (3,3), activation='relu', padding='same'))
model.add(UpSampling2D((2, 2)))
model.add(Conv2D(2, (3, 3), activation='tanh', padding='same'))
model.add(UpSampling2D((2, 2)))
model.compile(optimizer='adam', loss='mse' , metrics=['accuracy'])
model.summary()
Figuring out these aspects of a network is more art than mathematics. As such, we cannot define a constant compression point without properly analyzing the data, which is the reason why neural nets are used in the first place.
We can however intuitively consider what happens at every layer. For example, in an image colorization problem, it is better not to use too many pooling layers since that discards a huge amount of information. A max pooling layer of size 2x2 with a stride of 2 discards 75% of its input data. This is much more useful in classification to eliminate improbable classes. Similarly, ReLU discards all negative data, and may not be the best function choice for the problem at hand.
Here are a few tips that may help with your specific problem:
Reduce the number of pooling layers. Before pooling, try increasing the number of trainable layers so that the model (intuitively) learns to aggregate important information to avoid pooling it out.
Change the activation to elu, LeakyReLU or such, that do not eliminate negative values, especially since the output requires negative values as well.
Maybe try BiLinear or BiCubic upsampling to maintain structure? I'd also suggest taking a look at the so called "magic" kernel here. Personally, I've had good results with it, though it takes time to implement it efficiently.
If you have enough GPU space, increase the number of channels. This particular point does not have much to consider, except overfitting in some cases.
Preferably, use a Conv2D layer as the final layer to compensate for artifacts while upsampling.
Keep in mind that these points are for general use cases. Models in research papers are a different case, and are not as simple as your architecture. All these points may or may not apply to a specific paper.
I am working on a CNN model and want to add a new categorical feature before the Dense layer. I tried to concatenate the feature to the flattened output of CNN layer but looks like the concatenate function in Keras requires input of tensors and not arrays. How should I go about it? Here is the code that I have tried so far:
model = Sequential()
model.add(Conv2D(128, (6, 6), padding='same'))
model.add(Activation('relu'))
model.add(Conv2D(128, (6, 6)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
I am trying to use Concatenate function but it can join tensors, where as my feature is a numpy array of shape (1, 3). Any help would be appreciated.
You should create a new model on side of your actual model.
This second model will take in input your numpy array and does nothing else.
Then you concatenate them.
Like this ->
m1 = Sequential()
m1.add(Conv2D(128, (6, 6), padding='same'))
m1.add(Activation('relu'))
m1.add(Conv2D(128, (6, 6)))
m1.add(Activation('relu'))
m1.add(MaxPooling2D(pool_size=(2, 2)))
m1.add(Dropout(0.25))
m1.add(Flatten())
m2 = Sequential()
m2.add(Input()) # Put needed infos to input your numpy array
#Don't forget to flatten it if needed ?
model = Sequential()
model.add(Merge([m1,m2], mode='concat'))
#Then add your final layer.
#To train it, in place of the normal var X_train, you'll use [X_train,yournumpyarray] in model.train method