Trying different input combinations for a tensorflow model

Trying different input combinations for a tensorflow model - python

I have a neural network constructed in Tensorflow, and am using sentiment dimensions to try construct a predictive model. A few dimensions are Anger, Sad, Joy, Surprise, Positive, Negative, etc.. My aim is to try to combine different dimensions to see if I can find a non-linear relationship between them (I'm using a self-organizing fuzzy neural network) with what I am trying to predict. (e.g. 'Anger Surpise', 'Anger, Sad', 'Sad, Joy, Surprise', etc.)
What I have tried:
I have got all of the different combinations using the 'itertools' library. I then created a function that takes in the columns I would like to try, and then splits my pandas dataframe into training and testing, trains the model, and returns an output.
I have tried calling this function using .map on a pandas dataframe with one column consisting of the combinations, I've also tried using a threadpool, and also just a simple loop over the list of combinations and calling my function, however the time taken gets exponentially slower, and I think this is to do with the garbage collector not doing its job (my RAM usage also becomes very high). I then attempted to del all of the training, testing dataframes and model after every function call however it did not help.
tl;dr Is there a good way to try different combinations of inputs on a tensorflow model?

Your problems seems complex - there is a lot of factors to determine.
Firstly - I suggest to construct data structure for that type of operation - and make some research in existing codebases on github.
For strictly operating with images & face recognition - cv2 and openCV.
That type of model basically work on face recognitioin & positions of points in (x,y) and euclidean distance based on this.
In the process of declaring and adjustinig layer - it's possibility to mix those layers -
(Conv64, Pooling, Conv128, Conv258, Conv512)
eg.
model = Sequential()
model.add(Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=(48,48,1)))
model.add(Conv2D(64, kernel_size=(3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Conv2D(128, kernel_size=(3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(128, kernel_size=(3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(1024, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(7, activation='softmax'))
(Reference: https://github.com/atulapra/Emotion-detection/blob/master/src/emotions.py)
And to define the previius mentioned factors eg.
(0=Angry, 1=Disgust, 2=Fear, 3=Happy, 4=Sad, 5=Surprise, 6=Neutral)
So after categorizing emotions:
There is a need for reading position of points - point by point, something like:
for(x, y, w, h) in images

Related

why does cnn accuracy show only 0.3~0.4

I wrote very simple cnn code with spectrogram images, but
the accuracy is only 0.3~0.4
What do i have to add the other option to improve accuracy?
model.add(Conv2D(32, (3, 3), input_shape=X_train.shape[1:], padding='same'))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
# model.add(Dropout(0.25))
model.add(Conv2D(64, (3, 3), padding='same'))
model.add(Activation('relu'))
model.add(Conv2D(128, (3, 3), padding='same'))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
# model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(64))
model.add(Dense(32))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(14))
model.add(Activation('softmax'))```

With the information you provide, there is zero chance to help you with the problem. The definition of your model looks correct (but you missed an activation function after the first dense layer if this is by accident). So here are some considerations:
Do you train long enough? Your model is quite big and therefore needs a long time to converge AND a large dataset to train with.
Is your dataset large enough and contains enough variance? When your dataset doesn't represent your problem well, you can't train.
Take a look at the Loss curves of your validation AND training set. Are you overfitting/underfitting?
Do you correctly normalize and preprocess your dataset? Try to transform the values of the images to a range of -1 to 1 or 0 to 1 with a float datatype.
Is your dataset balanced? As you are softmaxing 14 classes, you need a balanced dataset in order to train every single class.
Hoped this helped a little, if you need further help please provide some detailed descriptions of your problem and what are you doing in your whole process.

Sigmoid activation output layer produce Many near-1 value

:)
I have a Datset of ~16,000 .wav recording from 70 bird species.
I'm training a model using tensorflow to classify the mel-spectrogram of these recordings using Convolution based architectures.
One of the architectures used is simple multi-layer convolutional described below.
The pre-processing phase include:
extract mel-spectrograms and convert to dB Scale
segment audio to 1-second segment (pad with zero Or gaussian noise if residual is longer than 250ms, discard otherwise)
z-score normalization of training data - reduce mean and divide result by std
pre-processing while inference:
same as described above
z-score normalization BY training data - reduce mean (of training) and divide result by std (of training data)
I understand that the output layer's probabilities with sigmoid activation is not suppose to accumulate to 1, But I get many (8-10) very high prediction (~0.999) probabilities. and some is exactly 0.5.
The current test set correct classification rate is ~84%, tested with 10-fold cross validation, So it seems that the the network mostly operates well.
notes:
1.I understand there are similar features in the vocalization of different birds species, but the recieved probabilities doesn't seem to reflect them correctly
2. probabilities for example - a recording of natural noise:
Natural noise: 0.999
Mallard - 0.981
I'm trying to understand the reason for these results, if it's related the the data etc extensive mislabeling (probably not) or from another source.
Any help will be much appreciated! :)
EDIT: I use sigmoid because the probabilities of all classes are necessary, and I don't need them to accumulate to 1.
def convnet1(input_shape, numClasses, activation='softmax'):
# Define the network
model = tf.keras.Sequential()
model.add(InputLayer(input_shape=input_shape))
# model.add(Augmentations1(p=0.5, freq_type='mel', max_aug=2))
model.add(Conv2D(64, (3, 3), activation='relu', padding='same'))
model.add(BatchNormalization())
model.add(MaxPooling2D(pool_size=(2, 1)))
model.add(Conv2D(128, (3, 3), activation='relu', padding='same'))
model.add(BatchNormalization())
model.add(MaxPooling2D(pool_size=(2, 1)))
model.add(Conv2D(128, (5, 5), activation='relu', padding='same'))
model.add(BatchNormalization())
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(256, (5, 5), activation='relu', padding='same'))
model.add(BatchNormalization())
model.add(Flatten())
# model.add(Dense(numClasses, activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(numClasses, activation='sigmoid'))
model.compile(
loss='categorical_crossentropy',
metrics=['accuracy'],
optimizer=optimizers.Adam(learning_rate=0.001),
run_eagerly=False) # this parameter allows to debug and use regular functions inside layers: print(), save() etc..
return model

For future searches - this problem was solved, and the reason was found(!).
The initial batch size that was used was 256 or 512. reducing the batch size to 16 or 32 SOLVED THE PROBLEM, and now the difference in probabilities are as expected for training AND test set samples - very high for the correct label and very low for other classes.

how to decide input and output shape in DepthwiseConv2D (CNN) in tensroflow Python

My actual concern is how to choose input and output based on the data I have.
The shape of data is following for x and y -: ((90000, 6), (90000,)). and there are two labels in y.
My data is in CSV file(i am using 6 columns as features, and last column as Label), i am not using IMAGE data
model=models.Sequential()
model.add(tf.keras.layers.DepthwiseConv2D((3, 3), padding='valid', depth_multiplier=10, input_shape=(,)))
# 2 Max Pooling layers and 1 DepthwiseConv2d
model.add(layers.Flatten())
model.add(layers.Dense(200, activation='relu'))
model.add(layers.Dense(2,activation='softmax'))
Can someone tell me how to decide Input shape and what kind of reshaping i should do on data before passing it into the Model?
I am looking for suggestions that how can I decide the Input shape and what should i take care of.
also, let me know if the last layer is correct.
I already posted one problem related to this, but this more simplified version of what i actually want to do.
Thanks in advance.

I am a little confused here with the implementation, why are you using two dimensional CNN's? You can use tf.keras.layers.DepthwiseConv1D?
model = Sequential()
model.add(Conv1D(filters=64, kernel_size=3, activation='relu', input_shape=(x.shape[1],x.shape[2])))
model.add(Conv1D(filters=64, kernel_size=3, activation='relu'))
model.add(Dropout(0.5))
model.add(MaxPooling1D(pool_size=2))
model.add(Flatten())
model.add(Dense(100, activation='relu'))
model.add(Dense(y.shape[1], activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
I think something like that might solve your problem.

Image colorization using autoencoder - Maximum compression point

I am building a model for autoencoder. I have a dataset of images(256x256) in LAB color space.
But i dont know, what is the right maximum compression point. I found example, when i have 176 x 176 x 1 (~30976), then the point is 22 x 22 x 512 (~247808).
But how is that calculated?
My model:
model = Sequential()
model.add(Conv2D(32, (3, 3), activation='relu', padding='same', input_shape=(256, 256, 1)))
model.add(MaxPooling2D((2,2)))
model.add(Conv2D(64, (3, 3), activation='relu', padding='same'))
model.add(MaxPooling2D((2,2)))
model.add(Conv2D(128, (3,3), activation='relu', padding='same'))
model.add(MaxPooling2D((2,2)))
model.add(Conv2D(256, (3,3), activation='relu', padding='same'))
model.add(MaxPooling2D((2,2)))
model.add(Conv2D(512, (3,3), activation='relu', padding='same'))
#Decoder
model.add(Conv2D(256, (3,3), activation='relu', padding='same'))
model.add(UpSampling2D((2, 2)))
model.add(Conv2D(128, (3,3), activation='relu', padding='same'))
model.add(UpSampling2D((2, 2)))
model.add(Conv2D(64, (3,3), activation='relu', padding='same'))
model.add(UpSampling2D((2, 2)))
model.add(Conv2D(2, (3, 3), activation='tanh', padding='same'))
model.add(UpSampling2D((2, 2)))
model.compile(optimizer='adam', loss='mse' , metrics=['accuracy'])
model.summary()

Figuring out these aspects of a network is more art than mathematics. As such, we cannot define a constant compression point without properly analyzing the data, which is the reason why neural nets are used in the first place.
We can however intuitively consider what happens at every layer. For example, in an image colorization problem, it is better not to use too many pooling layers since that discards a huge amount of information. A max pooling layer of size 2x2 with a stride of 2 discards 75% of its input data. This is much more useful in classification to eliminate improbable classes. Similarly, ReLU discards all negative data, and may not be the best function choice for the problem at hand.
Here are a few tips that may help with your specific problem:
Reduce the number of pooling layers. Before pooling, try increasing the number of trainable layers so that the model (intuitively) learns to aggregate important information to avoid pooling it out.
Change the activation to elu, LeakyReLU or such, that do not eliminate negative values, especially since the output requires negative values as well.
Maybe try BiLinear or BiCubic upsampling to maintain structure? I'd also suggest taking a look at the so called "magic" kernel here. Personally, I've had good results with it, though it takes time to implement it efficiently.
If you have enough GPU space, increase the number of channels. This particular point does not have much to consider, except overfitting in some cases.
Preferably, use a Conv2D layer as the final layer to compensate for artifacts while upsampling.
Keep in mind that these points are for general use cases. Models in research papers are a different case, and are not as simple as your architecture. All these points may or may not apply to a specific paper.

Keras/TF: Time Distributed CNN+LSTM for visual recognition

I am trying to implement the Model from the article (https://arxiv.org/abs/1411.4389) that basically consists of time-distributed CNNs followed by a sequence of LSTMs using Keras with TF.
However, I am having a problem trying to figure out if I should include the TimeDirstibuted function just for my Convolutional & Pooling Layers or also for the LSTMs?
Is there a way to run the CNN Layers in parallel (Based on the number of frames in the sequence that I want to process and based on the number of cores that I have)?
And Last, suppose that each entry is composed of "n" frames (in sequence) where n varies based on the current data entry, what is the best suitable input dimension? and would "n" be the batch size? Is there a way to limit the number of CNNs in // to for example 4 (so that you get an output Y after 4 frames are processed)?
P.S.: The inputs are small videos (i.e. a sequence of frames)
P.S.: The output dimension is irrelevant to my question, so it is not discussed here
Thank you

[Edited]
Sorry, only-a-link-answer was bad. So I try to answer question one by one.
if I should include the TimeDirstibuted function just for my Convolutional & Pooling Layers or also for the LSTMs?
Use TimeDistributed function only for Conv and Pooling layers, no need for LSTMs.
Is there a way to run the CNN Layers in parallel?
No, if you use CPU. It's possible if you utilize GPU.
Transparent Multi-GPU Training on TensorFlow with Keras
what is the best suitable input dimension?
Five. (batch, time, width, height, channel).
Is there a way to limit the number of CNNs in // to for example 4
You can do this in the preprocess by manually aligning frames into a specific number, not in the network. In other words, "time" dimension should be 4 if you want to have output after 4 frames are processed.
model = Sequential()
model.add(
TimeDistributed(
Conv2D(64, (3, 3), activation='relu'),
input_shape=(data.num_frames, data.width, data.height, 1)
)
)
model.add(TimeDistributed(MaxPooling2D((2, 2), strides=(1, 1))))
model.add(TimeDistributed(Conv2D(128, (4,4), activation='relu')))
model.add(TimeDistributed(MaxPooling2D((2, 2), strides=(2, 2))))
model.add(TimeDistributed(Conv2D(256, (4,4), activation='relu')))
model.add(TimeDistributed(MaxPooling2D((2, 2), strides=(2, 2))))
# extract features and dropout
model.add(TimeDistributed(Flatten()))
model.add(Dropout(0.5))
# input to LSTM
model.add(LSTM(256, return_sequences=False, dropout=0.5))
# classifier with sigmoid activation for multilabel
model.add(Dense(data.num_classes, activation='sigmoid'))
Reference:
PRI-MATRIX FACTORIZATION - BENCHMARK

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.