multilabel text clasification with 1D CNN - python

I'm working on authorship detection from text task, I'm doing this by using a data frame consisting of symbol n-gram that I created using about 110k of 147 different authors' texts and TfidfVectorizer. Example of data, I encode author name strings using sklearn label encoder, which converts strings to numbers Example of labels
Data seperated into: (99817, 1000) (11091, 1000) (99817,) (11091,)
Using model below my best results were after 7-8 iterations: loss: 0.7225 -
accuracy: 0.8070 - val_loss: 1.3828 - val_accuracy: 0.6777, after that model starts to overfit.
Model:
model = Sequential(
[
Dense(300, activation="relu", input_shape=(Data_train.shape[-1],)),
Dense(750, activation="relu"),
BatchNormalization(),
Dropout(0.5),
Dense(147, activation="softmax"),
])
model.compile(optimizer='adam',
loss=tf.keras.losses.SparseCategoricalCrossentropy(),
metrics=['accuracy'])
history = model.fit(Data_train,
Labels_train,
epochs=10,
shuffle=True,
callbacks=[early_stopping])
I want to try to solve this task with CNN network, I have found similar example of my task using 1D network and tried to use it:
vocab_size = 1000
maxlen = 1000
batch_size = 32
embedding_dims = 10
filters = 16
kernel_size = 3
hidden_dims = 250
epochs = 10
early_stopping = EarlyStopping(patience=0.1)
model = Sequential(
[
Embedding(vocab_size, embedding_dims, input_length=maxlen),
Dropout(0.5),
Conv1D(filters, kernel_size, padding='valid', activation='relu'),
MaxPooling1D(),
BatchNormalization(),
Conv1D(filters, kernel_size, padding='valid', activation='relu'),
MaxPooling1D(),
Flatten(),
Dense(hidden_dims, activation='relu'),
Dropout(0.5),
Dense(147, activation='softmax')
]
)
model.compile(optimizer='adam',
loss=keras.losses.SparseCategoricalCrossentropy(),
metrics=['accuracy'])
model.fit(Data_train, Labels_train,
batch_size=batch_size,
epochs=epochs,
validation_split=0.2,
callbacks=[early_stopping])
I only manage to get these results, from the first iteration to the 5, accuracy and val_accuracy reach ~0.07 and stay the same, after 5 iterations this is what I got:
loss: 4.5621 - accuracy: 0.0701 - val_loss: 4.5597 - val_accuracy:
0.0702
Could someone help me to improve these models to get better results, especially CNN? any suggestions are welcome, if I need to provide anything more please let me know, thank you.

I have managed to solve my issue, instead of using an embedding layer, I transformed my data and labels adding additional dimensions and passing input shape directly to the convolutional layer
Data_train_tmp = tf.expand_dims(Data_train, -1)
Labels_train_tmp = tf.expand_dims(Labels_train, 1)
Conv1D(62, 5, activation='relu', input_shape=(Data_train.shape[1],1)),

Related

Watermark binary classifier in Tensorflow Keras stuck

My goal is to create a model that can classify pictures depending if ONE particular watermark is present or not. If I would like to check a different watermark, ideally it would be create another dataset with that new watermark, and re-training the model. As I understand this is a binary classifier.
Is this the right approach?
I am stuck with my model to identify if a picture has a watermark on it or not. My metrics don't move from. Example:
loss: 0.6931 - accuracy: 0.5000 - val_loss: 0.6931 - val_accuracy: 0.5000
I have prepared a data folder structure like:
Training
Watermark
No_watermark
Validation
Watermark
No_watermark
I have used a dataset with 1000 images in each category. Here is an exaplample of my dataset with my own watermark:
https://drive.google.com/file/d/1JBdbIw1yehx9XX9S6X7esVhVL8NG1dAK/view?usp=sharing
https://drive.google.com/file/d/14Rxul13zGzXgKD9GZeudn_K69BRBJ1tR/view?usp=sharing
https://drive.google.com/file/d/1oeXxSjppDMScoj04hzEEl3587ccCFqrB/view?usp=sharing
I hope you can help with this....
How can I change my model to "recognize" the watermark?
Why do my "loss" and "accuracy" not move even if I change the image size, epochs, dataset?
Should I just train the model with just the watermark image with augmentation and no background?
model = tf.keras.models.Sequential([
tf.keras.layers.Conv2D(64, (3,3), activation='relu', input_shape=(250, 250, 3)),
tf.keras.layers.MaxPooling2D(2, 2),
tf.keras.layers.Conv2D(64, (3,3), activation='relu'),
tf.keras.layers.MaxPooling2D(2,2),
tf.keras.layers.Conv2D(128, (3,3), activation='relu'),
tf.keras.layers.MaxPooling2D(2,2),
tf.keras.layers.Conv2D(128, (3,3), activation='relu'),
tf.keras.layers.MaxPooling2D(2,2),
tf.keras.layers.Flatten(),
tf.keras.layers.Dropout(0.5),
tf.keras.layers.Dense(512, activation='relu'),
tf.keras.layers.Dense(1, activation='sigmoid')
])
model.compile(loss = 'binary_crossentropy',
optimizer='rmsprop',
metrics=['accuracy'])
history = model.fit(train_generator,
epochs=25,
validation_data = validation_generator,
verbose = 1,
validation_steps=3)
Thanks
Since you're performing a binary classification, have you set the class_mode parameter in the ImageDataGenerator.flow_from_directory method to 'binary'? The default is 'categorical', which is not what you should be using here since you have a single output node.
It's a common pitfall. I'm guessing the value of accuracy is 0.5 at the start because you likely have equal number of watermarked vs non-watermarked images, and the performance never improves because you've passed the wrong value of class_mode.
TL;DR: Set class_mode='binary' (instead of the default class_mode='categorical') in flow_from_directory.

Not able to find a proper CNN

I am using Keras Tensorflow in Colab and I am working on the oxford_flowers102 dataset. Task is image classification. With quite many categories (102) and not so many images per class. I tried to build different neural networks, starting from simple one to more complex ones, with and without image augmentation, dropout, hyper parameter tuning, batch size adjustment, optimizer adjustment, image resizing size .... however, I was not able to find a good CNN which gives me an accetable val_accuracy and finally a good test accuracy. Up to now my max val_accuracy I was able to get was poor 0.3x. I am pretty sure that it is possible to get better results, I am somehow just not finding the right CNN setup. My code so far:
import tensorflow as tf
from keras.models import Model
import tensorflow_datasets as tfds
import tensorflow_hub as hub
# update colab tensorflow_datasets to current version 3.2.0,
# otherwise tfds.load will lead to error when trying to load oxford_flowers102 dataset
!pip install tensorflow_datasets --upgrade
# restart runtime
oxford, info = tfds.load("oxford_flowers102", with_info=True, as_supervised=True)
train_data=oxford['train']
test_data=oxford['test']
validation_data=oxford['validation']
IMG_SIZE = 224
def format_example(image, label):
image = tf.cast(image, tf.float32)
image = image*1/255.0
image = tf.image.resize(image, (IMG_SIZE, IMG_SIZE))
return image, label
train = train_data.map(format_example)
validation = validation_data.map(format_example)
test = test_data.map(format_example)
BATCH_SIZE = 32
SHUFFLE_BUFFER_SIZE = 1000
train_batches = train.shuffle(SHUFFLE_BUFFER_SIZE).batch(BATCH_SIZE)
test_batches = test.batch(BATCH_SIZE)
validation_batches = validation.batch(BATCH_SIZE)
First model I tried:
model = tf.keras.Sequential([
tf.keras.layers.Conv2D(32, 3, activation='relu', input_shape=(IMG_SIZE, IMG_SIZE, 3)),
tf.keras.layers.MaxPooling2D(),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(64, activation='relu'),
tf.keras.layers.Dense(102)
])
model.compile(optimizer='adam',
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['accuracy'])
history = model.fit(train_batches, validation_data=validation_batches, epochs=20)
Epoch 20/20 32/32 [==============================] - 4s 127ms/step -
loss: 2.9830 - accuracy: 0.2686 - val_loss: 4.8426 - val_accuracy:
0.0637
When I run it for more epochs, it overfits, val_loss goes up, val_accuracy does not go up.
Second model (very simple one):
model = tf.keras.models.Sequential([
tf.keras.layers.Flatten(input_shape=(IMG_SIZE, IMG_SIZE, 3)),
tf.keras.layers.Dense(128,activation='relu'),
tf.keras.layers.Dense(102)
])
model.compile(optimizer='adam',
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['accuracy'])
history = model.fit(train_batches, validation_data=validation_batches, epochs=20)
Does not work at all, loss stays at 4.6250.
Third model:
model = tf.keras.Sequential([
tf.keras.layers.Conv2D(64, (3,3), activation='relu', input_shape=(IMG_SIZE, IMG_SIZE, 3)),
tf.keras.layers.MaxPooling2D(2,2),
tf.keras.layers.Conv2D(64, (3,3), activation='relu'),
tf.keras.layers.MaxPooling2D(2,2),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dense(102)
])
base_learning_rate = 0.0001
model.compile(optimizer=tf.optimizers.RMSprop(lr=base_learning_rate),
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['accuracy'])
history = model.fit(train_batches, validation_data=validation_batches, epochs=20)
Model overfits. Val_accuracy not above 0.15.
I added dropout layers to this model (trying differet rates) and also adjusted the kernels. However, no real improvement. Also tried adam optimizer.
Fourth model:
model = tf.keras.Sequential([
tf.keras.layers.Conv2D(128, (3,3), activation='relu', input_shape=(IMG_SIZE, IMG_SIZE, 3)),
tf.keras.layers.Dropout(0.4),
tf.keras.layers.MaxPooling2D(2,2),
tf.keras.layers.Conv2D(256, (3,3), activation='relu'),
tf.keras.layers.Dropout(0.4),
tf.keras.layers.MaxPooling2D(2,2),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(512, activation='relu'),
tf.keras.layers.Dropout(0.4),
tf.keras.layers.Dense(102)
])
model.compile(optimizer='adam',
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['accuracy'])
history = model.fit(train_batches, validation_data=validation_batches, epochs=20)
Same problem again, no good val_accuracy. Also tried it with RMSprop optimizer. Not able to get a val_accuracy higher than 0.2.
Fifth model:
model = tf.keras.Sequential([
tf.keras.layers.Conv2D(64, (3,3), activation='relu', input_shape=(IMG_SIZE, IMG_SIZE, 3)),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.MaxPooling2D(2,2),
tf.keras.layers.Conv2D(64, (2,2), activation='relu'),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.GlobalAveragePooling2D(),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(256, activation='relu'),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(102)
])
base_learning_rate = 0.001
model.compile(optimizer=tf.optimizers.RMSprop(lr=base_learning_rate),
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['accuracy'])
history = model.fit(train_batches, validation_data=validation_batches, epochs=250)
val_accuracy at the highest around 0.3x. Also tried it with adam.
When I tried it with transfer learning, using Mobilenet I immediately got 0.7x within 10 epochs. So I wondered why I am not able to get close to this with a self-built CNN? I do not expect 0.8 or to beat Mobilenet. But where is my mistake? How would a self-built CNN look like with which I can get lets say 0.6-0.7 val_accuracy?
You can try to use some predefined CNN with optimizing its parameters using some metaheuristic optimizers such as Grey Wolf optimizer or PSO, etc....
It's not entirely clear from your question: are you concerned that your model architecture is inferior to that of say MobileNet's, or that your performance is not comparable to that of transfer learning with MobileNet?
In response to the first, in general, the popular architectures such as ResNet, MobileNet, AlexNet are very cleverly crafted networks and so are likely to better represent data than a hand-defined network unless you do something very clever yourself.
In response to the second, the more complex a model gets, the more data it needs to train it well so that it is not underfit or overfit to the data. This poses a problem on datasets such as your (with a few thousand images) because it is difficult for a complex CNN to learn meaningful rules (kernels) for extracting information from images in general without instead learning rules for memorizing the limited set of training inputs. In summary, you want a larger model to make more accurate predictions, but this in turn requires more data, which sometimes you don't have. I suspect that if you used an untrained MobileNet versus your untrained network on the oxford flowers102 dataset, you'd see similarly poor performance.
Enter transfer learning. By pretraining relatively large models on relatively huge datsets (most are pretrained on ImageNet which has millions of images), the model is able to learn to extract relevant information from arbitrary images much better than it would be on a smaller dataset. These general rules for feature extraction apply to your smaller dataset as well, so with just a bit of fine-tuning the transfer learning model will likely far outperform any model trained solely on your dataset.

Keras neural network takes only few samples to train

data = np.random.random((10000, 150))
labels = np.random.randint(10, size=(10000, 1))
labels = to_categorical(labels, num_classes=10)
model = Sequential()
model.add(Dense(units=32, activation='relu', input_shape=(150,)))
model.add(Dense(units=10, activation='softmax'))
model.compile(optimizer='rmsprop',
loss='categorical_crossentropy',
metrics=['accuracy'])
model.fit(data, labels, epochs=30, validation_split=0.2)
I created 10000 random samples to train my net, but it use only few of them(250/10000)
Exaple of the 1st epoch:
Epoch 1/30
250/250 [==============================] - 0s 2ms/step - loss: 2.1110 - accuracy: 0.2389 - val_loss: 2.2142 - val_accuracy: 0.1800
Your data is split into training and validation subsets (validation_split=0.2).
Training subset has size 8000 and validation 2000.
Training goes in batches, each batch has size 32 samples by default.
So one epoch should take 8000/32=250 batches, as it shows in the progress.
Try code like following example
model = Sequential()
model.add(Dense(32, activation='relu', input_dim=100))
model.add(Dense(10, activation='softmax'))
model.compile(optimizer='rmsprop',
loss='categorical_crossentropy',
metrics=['accuracy'])
# Generate dummy data
import numpy as np
data = np.random.random((1000, 100))
labels = np.random.randint(10, size=(1000, 1))
# Convert labels to categorical one-hot encoding
one_hot_labels = keras.utils.to_categorical(labels, num_classes=10)
# Train the model, iterating on the data in batches of 32 samples
model.fit(data, one_hot_labels, epochs=10, batch_size=32)

Fine tuning CNN hyperparameters for complex text classification

I'm working on a CNN model for complex text classification (mainly emails and messages). The dataset contains around 100k entries distributed on 10 different classes. My actual Keras sequential model has the following structure:
model = Sequential(
[
Embedding(
input_dim=10000,
output_dim=150,
input_length=400),
Convolution1D(
filters=128,
kernel_size=4,
padding='same',
activation='relu'),
BatchNormalization(),
MaxPooling1D(),
Flatten(),
Dropout(0.4),
Dense(
100,
activation='relu'),
Dropout(0.4),
Dense(
len(y_train[0]),
activation='softmax')])
In compiling the model I'm using the Nadam optimizer, categorical_crossentropy loss with LabelSmoothing set to 0.2 .
In a model fit, I'm using 30 Epochs and Batch Size set to 512. I also use EarlyStopping to monitor val_loss and patience set to 8 epochs. The test size is set to 25% of the dataset.
Actually the training stops after 16/18 epochs with values that start to fluctuate a little after 6/7 epoch and then go on till being stopped by EarlyStopping. The values are like these on average:
loss: 1.1673 - accuracy: 0.9674 - val_loss: 1.2464 - val_accuracy: 0.8964
with a testing accuracy reaching:
loss: 1.2461 - accuracy: 0.8951
Now I'd like to improve the accuracy of my CNN, I've tried different hyperparameters but as for now, I wasn't able to get a higher value. Therefore I'm trying to figure out:
if there is still room for improvements (I bet so)
if the solution is in a fine-tuning of my hyperparameters and, if so, which ones should I change?
if going deeper by adding layers to the model could be of any use and, if so, how to improve my model
is there any other deep-learning/Neural networks approach rather than CNN that could lead to a better result?
Thank you very much to anybody who will help! :)
There are many libraries, but I find this one very flexible. https://github.com/keras-team/keras-tuner
Just install with pip.
Your updated model, feel free to choose the search range.
from tensorflow import keras
from tensorflow.keras import layers
from kerastuner.tuners import RandomSearch
def build_model(hp):
model = keras.Sequential()
model.add(layers.Embedding(input_dim=hp.Int('input_dim',
min_value=5000,
max_value=10000,
step = 1000),
output_dim=hp.Int('output_dim',
min_value=200,
max_value=800,
step = 100),
input_length = 400))
model.add(layers.Convolution1D(
filters=hp.Int('filters',
min_value=32,
max_value=512,
step = 32),
kernel_size=hp.Int('kernel_size',
min_value=3,
max_value=11,
step = 2),
padding='same',
activation='relu')),
model.add(layers.BatchNormalization())
model.add(layers.MaxPooling1D())
model.add(layers.Flatten())
model.add(layers.Dropout(0.4))
model.add(layers.Dense(units=hp.Int('units',
min_value=64,
max_value=256,
step=32),
activation='relu'))
model.add(layers.Dropout(0.4))
model.add(layers.Dense(y_train[0], activation='softmax'))
model.compile(
optimizer=keras.optimizers.Adam(
hp.Choice('learning_rate',
values=[1e-2, 1e-3, 1e-4])),
loss='categorical_crossentropy',
metrics=['accuracy'])
return model
tuner = RandomSearch(
build_model,
objective='val_accuracy',
max_trials=5,
executions_per_trial=3,
directory='my_dir',
project_name='helloworld')
tuner.search_space_summary()
## The following lines are based on your model
tuner.search(x, y,
epochs=5,
validation_data=(val_x, val_y))
models = tuner.get_best_models(num_models=2)
You can try replacing the Conv1D layers with LSTM layers and observe if you get better performance.
LSTM(units = 512) https://keras.io/layers/recurrent/
If you want to extract more meaningful features, one approach I found promising is by extracting pre-trained BERT features and then training using a CNN/LSTM.
A great repository to get started is this one -
https://github.com/UKPLab/sentence-transformers
Once you get the sentence embedding from the BERT/XLNet you can use those features to train another CNN similar to the one you are using except maybe get rid of the embedding layer as it's expensive.

Tensorflow image classification binary crossentropy loss is negative

I'm new to Tensorflow. I followed some tutorials with a provided dataset and wanted to try something on my own. I decided I'd try to classify Magic the Gathering sets. Each card has a symbol in different colors on it: Black, Gold and so on.
The colors don't matter, just the different symbols. So I created a dataset of 3 different sets (so 3 different symbols) and got around 15'000 images like this. Some are a little bit rotated, some have an X and Y offset, just to get some different images.
Then I adapted the tutorial on the tensorflow website for image classification. Instead of two classes I wanted to try three:
batch_size = 250
epochs = 3
IMG_HEIGHT = 55
IMG_WIDTH = 55
train_image_generator = ImageDataGenerator(rescale=1./255)
validation_image_generator = ImageDataGenerator(rescale=1./255)
train_data_gen = train_image_generator.flow_from_directory(batch_size=batch_size,
directory=train_dir,
shuffle=True,
target_size=(IMG_HEIGHT, IMG_WIDTH),
class_mode='binary')
val_data_gen = validation_image_generator.flow_from_directory(batch_size=batch_size,
directory=validation_dir,
target_size=(IMG_HEIGHT, IMG_WIDTH),
class_mode='binary')
model = Sequential([
Conv2D(16, 3, padding='same', activation='relu', input_shape=(IMG_HEIGHT, IMG_WIDTH ,3)),
MaxPooling2D(),
Conv2D(32, 3, padding='same', activation='relu'),
MaxPooling2D(),
Conv2D(64, 3, padding='same', activation='relu'),
MaxPooling2D(),
Flatten(),
Dense(512, activation='relu'),
Dense(1, activation='sigmoid')
])
model.compile(optimizer='adam',
loss='binary_crossentropy',
metrics=['accuracy'])
history = model.fit_generator(
train_data_gen,
steps_per_epoch=total_train // batch_size,
epochs=epochs,
validation_data=val_data_gen,
validation_steps=total_val // batch_size,
callbacks=[cp_callback]
)
But my loss is negative and I don't get a good accuracy after training. What did I mess up? Is the model used in the tutorial not good for my usecase? Or is there an error in the code because I used three instead of two classes?
The model from the tutorial was used for binary classification (only two classes, cat or dog). You on the other hand want to classify 3 classes not 2. Therefore you have to adapt the architecture a little bit. Your last layer should be:
Dense(3, activation='softmax')
Three neurons because you have three classes and softmax activation because you want your outputs to be valid probabilities. To compile the model, use categorical_crossentropy instead of binary_crossentropy and make sure your labels are one-hot-encoded. Also for your ImageDataGenerator you should pass class_mode=categorical to the .flow_from_directory() function.

Categories

Resources