classifier after binary classifier - python

I have some images which contains some structure (building for example).
I have the pixel data and the labels data.
The labels data are in the form:
np.array([[0, nan, nan],
[1, 2, 1],
[1, 3, 2]])
The first column means: is there or not a structure ? 1 or 0
The second: what type is the structure ? types: 1,2,3
The third: how many structures are there?
When we don't have a structure, all other values are nan.
So, for the example above we have:
first line: no structure
second line: we have a structure of type 2 and it is one
third line: we have a structure of type 3 and there are two of them
So, first I am doing a binary classification to find out if we have a structure or not.
I am using vggnet with pretrained weights.
imagenet_weights = './vgg16_weights_tf_dim_ordering_tf_kernels_notop.h5'
base_vgg16 = VGG16(include_top=False, weights=imagenet_weights, input_shape=input_shape)
last_layer = base_vgg16.output
x = Flatten()(last_layer)
x = Dense(512, activation='relu')(x)
x = Dropout(0.3)(x)
preds = Dense(1, activation='sigmoid')(x)
base_vgg16.trainable = False
model_vgg16 = Model(base_vgg16.input, preds)
and for training I am using 3kfold and data augmentation.
This gives 97% accuracy.
Now, I want to make a classification in order to find out the type, based on the binary classification results.
So , I have :
pred_val = model.predict(X_val)
Now, I am not sure how to proceed.
What to use as input for the classifier.
I tried:
X_train = X_train[np.where(pred_val >= 0.5), :]
or
pred_train = model.predict(X_train)
X_train = X_train[np.where(pred_train.squeeze() >= 0.5), :]
( I am not sure if it has any meaning to predict on train data)
Now, however I run the second network, I am always receiving awful results.
Val_loss is almost steady around (10-11)
Train_loss is around 10-12
Val_acc is almost steady around (0.25-0.35)
Train_acc is 0.28-0.38
For the second network:
inputs = Input(shape=input_shape)
x = Flatten()(inputs)
x = Dense(512, activation='relu')(x)
x = Dense(256, activation='relu')(x)
preds = Dense(5, activation='softmax')(x)
model = Model(inputs, preds)
Whatever combinations I tried, more or less units ( I tried only a few like : x = Dense(8, activation='relu')(x))
or whatever batch size and optimizer and learning rate the result is the same.

Related

Transformer model is very slow and doesn't predict well

I created my first transformer model, after having worked so far with LSTMs. I created it for multivariate time series predictions - I have 10 different meteorological features (temperature, humidity, windspeed, pollution concentration a.o.) and with them I am trying to predict time sequences (24 consecutive values/hours) of air pollution. So my input has the shape X.shape = (75575, 168, 10) - 75575 time sequences, each sequence contains 168 hourly entries/vectors and each vector contains 10 meteo features. My output has the shape y.shape = (75575, 24) - 75575 sequences each containing 24 consecutive hourly values of the air pollution concentration.
I took as a model an example from the official keras site. It is created for classification problems, I only took out the softmax activation and in the last dense layer I set the number of neurons to 24 and I hoped it would work. I runs and trains, but it doesn't do a better job than the LSTMs I have used on the same problem and more importantly - it is very slow - 4 min/epoch. Below I attach the model and I would like to know:
I) Have I done something wrong in the model? can the accuracy or speed be improved? Are there maybe some other parts of the code I need to change for it to work on regression, not classification problems?
II) Also, can a transformer at all work on multivariate problems of my kind (10 features input, 1 feature output) or do transformers only work on univariate problems? Tnx
def build_transformer_model(input_shape, head_size, num_heads, ff_dim, num_transformer_blocks, mlp_units, dropout=0, mlp_dropout=0):
inputs = keras.Input(shape=input_shape)
x = inputs
for _ in range(num_transformer_blocks):
# Normalization and Attention
x = layers.LayerNormalization(epsilon=1e-6)(x)
x = layers.MultiHeadAttention(
key_dim=head_size, num_heads=num_heads, dropout=dropout
)(x, x)
x = layers.Dropout(dropout)(x)
res = x + inputs
# Feed Forward Part
x = layers.LayerNormalization(epsilon=1e-6)(res)
x = layers.Conv1D(filters=ff_dim, kernel_size=1, activation="relu")(x)
x = layers.Dropout(dropout)(x)
x = layers.Conv1D(filters=inputs.shape[-1], kernel_size=1)(x)
x = x + res
x = layers.GlobalAveragePooling1D(data_format="channels_first")(x)
for dim in mlp_units:
x = layers.Dense(dim, activation="relu")(x)
x = layers.Dropout(mlp_dropout)(x)
x = layers.Dense(24)(x)
return keras.Model(inputs, x)
model_tr = build_transformer_model(input_shape=(window_size, X_train.shape[2]), head_size=256, num_heads=4, ff_dim=4, num_transformer_blocks=4, mlp_units=[128], mlp_dropout=0.4, dropout=0.25)
model_tr.compile(loss="mse",optimizer='adam')
m_tr_history = model_tr.fit(x=X_train, y=y_train, validation_split=0.25, batch_size=64, epochs=10, callbacks=[modelsave_cb])

Unable to completely separate outputs of model in TensorFlow

I am trying to create a convolutional neural network that has two regression outputs, a score and a confidence. I have frozen the layers they have in common in the hopes that the addition of the confidence output doesn't change the score, but in my experiments it has. For the model with just the score, I used Xception and added a simple GlobalAveragePooling2D and Dense(512) layer then output a single number.
base_model = Xception(input_shape=(224, 224, 3), weights='imagenet', include_top=False)
x = base_model.output
x = GlobalAveragePooling2D()(x)
x = Dense(512, activation='relu')(x)
predictions = Dense(1, activation='sigmoid')(x)
model = Model(inputs=base_model.input, outputs=predictions)
for layer in base_model.layers:
layer.trainable = False
optimizer = Adam(learning_rate=learning_rate)
model.compile(loss='mae', optimizer=optimizer, metrics=['mse','mae'], run_eagerly=True)
Here is what the end of model.summary() looks like:
When I fit it, the model produces good results.
But when I try to add a second output the result of the first becomes much worse. The new model gets trained off tuples where is first number is the same as the first model and the second number is a confidence value. The model is very similar to the one above.
base_model = Xception(input_shape=(224, 224, 3), weights='imagenet', include_top=False)
x = base_model.output
x = GlobalAveragePooling2D()(x)
score_x = Dense(512, activation='relu')(x)
score_out = Dense(1, activation='sigmoid', name='score_model')(score_x)
confidence_x = Dense(512, activation='relu')(x)
confidence_out = Dense(1, name='confidence_model')(confidence_x)
model = Model(inputs=base_model.input, outputs=[score_out, confidence_out])
for layer in base_model.layers:
layer.trainable = False
losses = {'score_model': 'mae', 'confidence_model': 'mae'}
loss_weights = {'score_model': 1, 'confidence_model': 1}
model.compile(loss=losses, loss_weights=loss_weights, optimizer=optimizer, metrics=['mse','mae'], run_eagerly=True)
When I look at model.summary(), it has twice as many trainable parameters as the previous model, which is exactly what I was expecting. Everything looks right to me so far.
But when I train this model the performance on the score is much worse. I was thinking it would be the same (within stochastic variation). After the first epoch, the loss from the first model is around 0.125. The score_model_loss from the second model is around 0.554. Clearly I'm not completely separating the models. What am I missing?
Note: This answer will work well only because the layer that do the feature extraction are frozen. As #Akshay Sehgal stated in the comments :
optimizing for 2 goals together is actually a completely different problem than optimizing 2 independent goals separately
In that case, we are optimizing for 2 goals separately.
The easiest solution is probably to write a custom training loop with 2 tf.GradientTape, one for each goal. Lets consider this really simple example:
Dummy data
Let's create some random Data
import tensorflow as tf
X = tf.random.normal((1000,1))
y1= 3*X + 1
y2 = -2*X +2
ds = tf.data.Dataset.from_tensor_slices((X,y1,y2)).batch(10)
Creating a model with 2 outputs
In that example, I skip the feature extraction step, as a simple linear regression will work for the data. But as your feature extractor network is frozen, the example is similar.
inp = tf.keras.Input((1,))
dense_1 = tf.keras.layers.Dense(1, name="objective1")(inp)
dense_2 = tf.keras.layers.Dense(1, name="objective2")(inp)
model = tf.keras.Model(inputs=inp, outputs=[dense_1, dense_2])
# setting up the loss functions as well as the optimizer
opt = tf.optimizers.SGD()
loss_func1 = tf.losses.mean_squared_error
loss_func2 = tf.losses.mean_absolute_error
Note the name given to the two dense layers: I will use them later to retrieve the appropriate weights.
Getting the weights to optimize
We can use the name set before to retrieve the variable belonging to each objective :
var1, var2 = [],[]
for l in model.layers:
if "objective1" in l.name:
var1 += l.trainable_variables
if "objective2" in l.name:
var2 += l.trainable_variables
The training loop
You simply need to tapes, one for each objective. You can use different optimizer as well, if it makes the training better.
counter = 0
for x, y1, y2 in ds:
counter += 1
with tf.GradientTape() as tape1, tf.GradientTape() as tape2:
pred1, pred2 = model(x)
loss1 = loss_func1(y1, pred1)
loss2 = loss_func2(y2, pred2)
grad1 = tape1.gradient(loss1, var1)
grad2 = tape2.gradient(loss2, var2)
opt.apply_gradients(zip(grad1, var1))
opt.apply_gradients(zip(grad2, var2))
if counter % 10:
print(f"Step : {counter}, objective1: {tf.reduce_mean(loss1)}, objective2: {tf.reduce_mean(loss2)}")
If we run the training, we get:
Step : 1, objective1: 4.609124183654785, objective2: 2.6634981632232666
[...]
Step : 99, objective1: 7.176481902227555e-14, objective2: 0.030187154188752174
The principle advantage training that way is that you just need to extract the features once for the two objectives.

Character-based Text Classification with Triplet Loss

Im trying to implement a text-classifier using triplet loss to classify different job descriptions into categories based on this paper. But whatever i do, the classifier yields very bad results.
For the embedding i followed this tutorial and the NN architecture is based on this article.
I create my encodings using:
max_char_len = 20
group_numbers = range(0, len(job_groups))
char_vocabulary = {'PAD':0}
X_char = []
y_temp = []
i = 1
for group, number in zip(job_groups, group_numbers):
for job in group:
job_cleaned = some_cleaning_function(job)
job_enc = []
for c in job_cleaned:
if c in char_vocabulary.keys():
job_enc.append(char_vocabulary[c])
else:
char_vocabulary[c] = i
job_enc.append(char_vocabulary[c])
i+=1
X_char.append(job_enc)
y_temp.append(number)
X_char = pad_sequences(X_char, maxlen = max_char_length, truncating='post')
My Neural Network is set up the following way:
def create_base_model():
char_in = Input(shape=(max_char_length,), name='Char_Input')
char_enc = Embedding(input_dim=len(char_vocabulary)+1, output_dim=20, mask_zero=True,name='Char_Embedding')(char_in)
x = Bidirectional(LSTM(64, return_sequences=True, recurrent_dropout=0.2, dropout=0.4))(char_enc)
x = Bidirectional(LSTM(64, return_sequences=True, recurrent_dropout=0.2, dropout=0.4))(x)
x = Bidirectional(LSTM(64, return_sequences=True, recurrent_dropout=0.2, dropout=0.4))(x)
x = Bidirectional(LSTM(64, return_sequences=False, recurrent_dropout=0.2, dropout=0.4))(x)
out = Dense(128, activation = "softmax")(x)
return Model(char_in, out)
def get_siamese_triplet_char():
anchor_input_c = Input(shape=(max_char_length,),name='Char_Input_Anchor')
pos_input_c = Input(shape=(max_char_length,),name='Char_Input_Positive')
neg_input_c = Input(shape=(max_char_length,),name='Char_Input_Negative')
base_model = create_base_model(encoding_generator)
encoded_anchor = base_model(anchor_input_c)
encoded_positive = base_model(pos_input_c)
encoded_negative = base_model(neg_input_c)
inputs = [anchor_input_c, pos_input_c, neg_input_c]
outputs = [encoded_anchor, encoded_positive, encoded_negative]
siamese_triplet = Model(inputs, outputs)
siamese_triplet.add_loss((triplet_loss(outputs)))
siamese_triplet.compile(loss=None, optimizer='adam')
return siamese_triplet, base_model
The triplet loss is defined as follows:
def triplet_loss(inputs):
anchor, positive, negative = inputs
positive_distance = K.square(anchor - positive)
negative_distance = K.square(anchor - negative)
positive_distance = K.sqrt(K.sum(positive_distance, axis=-1, keepdims = True))
negative_distance = K.sqrt(K.sum(negative_distance, axis=-1, keepdims = True))
loss = positive_distance - negative_distance
loss = K.maximum(0.0, 1 + loss)
return K.mean(loss)
The model is then trained with:
siamese_triplet_char.fit(x=
[Anchor_chars_train,
Positive_chars_train,
Negative_chars_train],
shuffle=True, batch_size=8, epochs=22, verbose=1)
My goal is to: First, train the network with no label data in order to minimize the space of the different phrases and second, add a classification layer and create the final classifier.
My general problem is that even the first phase shows sinking cost-values it overfits and the validation results jump around and the second phase fails badly as I'm not able to train the model to actually classify.
My questions are the following:
Could someone explain the Embedding Architecture? What is the output dimension refering to? The individual characters? Would that even make sense? Or is there a better way to encode the input data?
How can i add validation_data to a network that does not contain labeled data? I could use validation_split, but i would rather prefer passing specific data to validate as my data is stratified.
Is there a reason why the classification does not work? Applying a simple K-Nearest Neighbor algorithm achieves at best 0.5 accuracy! Is it because of the data? Or is there a systematic error in my system?
All ideas and suggestions are really appreciated!

How input should relate/map to label y if Keras Model.fit() is given a list of input train arrays>

I am trying to work with a Deep learning model in two of the following scenarios, where two different inputs are given. I want to achieve following:
Train two models (with different weights but same architecture) with same input and concatenate the result. So in model.fit(), I am passing just the trainX value. Code is given below. It works fine.
def create_model(input_tensor):
x= Conv1D(filters = 16, kernel size=6, strides = 5, kernel_initializer = "uniform", activation = "relu")(input_tensor)
x= GlobalMaxPooling1D()(x)
x = Dense(2,activation ='softmax')()
return x
dataframe = pd.read_csv(Filename, index_col=0)
X= dataframe.values[:,:].astype(float)
Y = dataframe.values[:,1]
trainx, testx, trainy, testy = train_test_split(X,Y, test_Szie= 0.2, random_state=200, shuffle =True)
input_shape = (33000,1)
input_tensor = Input(input_shape)
pred_a = create_model(input_tensor)
pred_b = create_model(input_tensor)
out = keras.layers.Multiply()([pred_a, pred_b])
model =Model(inputs=(input_tensor), outputs=out)
model.compile(loss='categorical_crossentropy', optimizer= 'Adam', metrics =['accuracy'])
histroy = model.fit(trainX, trainy)
Train same model (with same weights) twice but with different inputs. I am confused how to pass inputs in this case. In normal cases, we have equal number of instances in both trainX and trainy data. If I pass a list like model.fit([x_train_1, x_train_2], trainy), then the number of instances of combined x_train_1, x_train_2 will be double than y. How trainy corresponds to the input trainx in this case?
The input and corresponding output of a model have shapes as X = (batch_size, ....) , y = (batch_size,....)
In case of multiple inputs, you can define multiple input layers and feed them to your different model instances as follows
inp_A = Input(shape=(...))
inp_B = Input(shape=(...))
pred_A = create_model(inp_A)
pred_B = create_model(inp_B)
*** Other layers and code ****
model = Model(inputs=[inp_A, inp_B], outputs=out)
*** Other code ***
Then you can call model.fit with passing a list of inputs and a single output.

Huge difference between training accuracy and evaluation accuracy using Tensorflow Datasets class + Keras

I created a dataset from my data and this dataset is in the form of (features,labels). features' dimension is [?,731,7] (where the ? should be 400), the corresponding labels' dimension is [4,] as shown in my dataset. Each [731,7] sample corresponds to a 4 elements array like [0,1,0,0].
A few sample data:
Sampledata1
Sampledata2
After building a simple multi-layer neural network, the training process is normal as follow. But when I use the same dataset to validate (just to check if the algorithm is working or not), I actually got a huge difference.
I don't think it is right but I am not sure if this happens because I used .eval() wrong or my datasets got wrong.
My code for datasets creation:
filenames = glob.glob(main_dir+keywords)
# filenames = ['test.txt','test2.txt']
length = len(filenames) # num of files
length_samesat = 100 # happen to be this... I designed in propogation
batch_num = 731 # happen to be this...
dataset = tf.data.Dataset.from_tensor_slices(filenames)
dataset = dataset.flat_map(lambda filename: tf.data.TextLineDataset(filename).skip(3))
dataset = dataset.map(lambda string: tf.string_split([string],delimiter=', ').values)
dataset = dataset.map(lambda x: tf.strings.to_number(x))
dataset = dataset.batch(batch_num)
dataset = dataset.map(lambda tensor: tf.reshape(tensor,[batch_num,7]))
dataset = dataset.batch(1).repeat()
Then I zip my dataset with my label dataset and create NN and run
dataset_all = tf.data.Dataset.zip((dataset, datalabel))
dataset_all = dataset_all.shuffle(400)
visual_dataset(dataset_all,0,20)
# NN Model
inputs = tf.keras.Input(shape=(731,7,)) # Returns a placeholder tensor
# A layer instance is callable on a tensor, and returns a tensor.
x = tf.keras.layers.Flatten()(inputs)
x = tf.keras.layers.Dense(400, activation='tanh')(x)
x = tf.keras.layers.Dense(400, activation='tanh')(x)
# x = tf.keras.layers.Dense(450, activation='tanh')(x)
# x = tf.keras.layers.Dense(300, activation='tanh')(x)
# x = tf.keras.layers.Dense(450, activation='tanh')(x)
# x = tf.keras.layers.Dense(200, activation='relu')(x)
# x = tf.keras.layers.Dense(100, activation='relu')(x)
predictions = tf.keras.layers.Dense(4, activation='softmax')(x)
# Instantiate the model given inputs and outputs.
model = tf.keras.Model(inputs=inputs, outputs=predictions)
# The compile step specifies the training configuration.
model.compile(optimizer=tf.train.RMSPropOptimizer(0.001),
loss='categorical_crossentropy',
metrics=['accuracy'])
# Trains for 5 epochs
model.fit(dataset_all, epochs=5, steps_per_epoch=400)
model.evaluate(dataset_all, steps=400)
Thanks!

Categories

Resources