I am currently using a neural network that outputs a one hot encoded output.
Upon evaluating it with a classification report I am receiving this error:
UndefinedMetricWarning: Recall and F-score are ill-defined and being set
to 0.0 in samples with no true labels.
When one-hot encoding my output during the train-test-split phase, I had to drop one of the columns in order to avoid the Dummy Variable Trap. As a result, some of the predictions of my neural network are [0, 0, 0, 0], signaling that it belongs to the fifth category. I believe this to be the cause of the UndefinedMetricWarning:.
Is there a solution to this? Or should I avoid classification reports in the first place and is there a better way to evaluate these sorts of neural networks? I'm fairly new to machine learning and neural networks, please forgive my ignorance. Thank you for all the help!!
Edit #1:
Here is my network:
from keras.models import Sequential
from keras.layers import Dense
classifier = Sequential()
classifier.add(Dense(units = 10000,
input_shape = (30183,),
kernel_initializer = 'glorot_uniform',
activation = 'relu'
)
)
classifier.add(Dense(units = 4583,
kernel_initializer = 'glorot_uniform',
activation = 'relu'
)
)
classifier.add(Dense(units = 1150,
kernel_initializer = 'glorot_uniform',
activation = 'relu'
)
)
classifier.add(Dense(units = 292,
kernel_initializer = 'glorot_uniform',
activation = 'relu'
)
)
classifier.add(Dense(units = 77,
kernel_initializer = 'glorot_uniform',
activation = 'relu'
)
)
classifier.add(Dense(units = 23,
kernel_initializer = 'glorot_uniform',
activation = 'relu'
)
)
classifier.add(Dense(units = 7,
kernel_initializer = 'glorot_uniform',
activation = 'relu'
)
)
classifier.add(Dense(units = 4,
kernel_initializer = 'glorot_uniform',
activation = 'softmax'
)
)
classifier.compile(optimizer = 'adam', loss = 'categorical_crossentropy', metrics = ['accuracy'])
The above is my network. After training the network, I predict values and convert them to class labels using:
from sklearn.preprocessing import LabelBinarizer
labels = np.argmax(predictions, axis = -1)
lb = LabelBinarizer()
labeled_predictions = lb.fit_transform(labels)
Upon calling a classification report comparing y_test and labeled_predctions, I receive the error.
**As a side note for anyone curious, I am experimenting with natural language processing and neural networks. The reason the input vector of my network is so large is that it takes in count-vectorized text as part of its inputs.
Edit #2:
I converted the predictions into a dataframe and dropped duplicates for both the test set and predictions getting this result:
y_test.drop_duplicates()
javascript python r sql
738 0 0 0 0
4678 1 0 0 0
6666 0 0 0 1
5089 0 1 0 0
6472 0 0 1 0
predictions_df.drop_duplicates()
javascript python r sql
738 1 0 0 0
6666 0 0 0 1
5089 0 1 0 0
3444 0 0 1 0
So, essentially what's happening is due to the way softmax is being converted to binary, the predictions will never result in a [0,0,0,0]. When one hot encoding y_test, should I just not drop the first column?
Yes I would say that you should not drop the first column. Because what you do now is to get the softmax and then take the neuron with the highest value as label (labels = np.argmax(predictions, axis = -1) ). With this approach you can never get a [0,0,0,0] result vector. So instead of doing this just create a onehot vector with positions for all 5 classes. You're problem with sklearn should then disappear, as you will get samples with true labels for your 5th class.
I'm also not sure if the dummy variable trap is a problem for neural networks. I have never heard from this before and a short google scholar search did not find any results. Also in all resources I've seen so far about neural networks I never saw this problem. So I guess (but this is really just a guess), that it isn't really a problem that you have when training neural networks. This conclusion is also driven by the fact that the majority of NNs use a softmax at the end.
Related
Only the first output parameter is learned to be properly estimated during training of a multi regression output net. Second and subsequent parameters only seem to follow first parameter. It seems, that ground truth for second output parameter is not used during training. How do I shape tf.data.Dataset and input it into model.fit() function so second output parameter is trained?
import tensorflow as tf
import pandas as pd
from tensorflow import keras
from keras import layers
#create dataset from csv
file = pd.read_csv( 'minimalDataset.csv', skipinitialspace = True)
input = file["input"].values
output1 = file["output1"].values
output2 = file["output2"].values
dataset = tf.data.Dataset.from_tensor_slices((input, (output1, output2))).batch(4)
#create multi output regression net
input_layer = keras.Input(shape=(1,))
x = layers.Dense(20, activation="relu")(input_layer)
x = layers.Dense(60, activation="relu")(x)
output_layer = layers.Dense(2)(x)
model = keras.Model(input_layer, output_layer)
model.compile(optimizer="adam", loss="mean_squared_error")
#train model and make prediction (deliberately overfitting to illustrate problem)
model.fit(dataset, epochs=500)
prediction = model.predict(dataset)
minimalDataset.csv and predictions:
input
output1
output2
prediction_output1
prediction_output2
0
-1
1
-0.989956
-0.989964
1
2
0
1.834444
1.845085
2
0
2
0.640249
0.596099
3
1
-1
0.621426
0.646796
If I create two independent dense final layers the second parameter is learned accurately but I get two losses:
output_layer = (layers.Dense(1)(x), layers.Dense(1)(x))
Note: I want to use tf.data.Dataset because I build a 20k image/csv with it and do per-element transformations as preprocessing.
tf.data.Dataset.from_tensor_slices() slices along the first dimension. Because of this the input and output tensors need to be transposed:
dataset = tf.data.Dataset.from_tensor_slices((tf.transpose(input), (tf.transpose([output1, output2])))).batch(4)
I already posted this question on CrossValidated, but thought the StackOverflow community, being bigger, might be able to answer this question faster.
I'd like to build a model that can output results for several multi-class classification problems at once. Suppose you have diagnostic data about a product that needs to be repaired and you want to predict the quantity of various part numbers that will be needed to repair the product. The input data is the same for all part numbers to be predicted.
Here's a concrete example. You have 2 part numbers that can get replaced, part A and part B. For part A you can replace 0, 1, 2, or 3 of them on the product. For part B you can replace 0, 2 or 4 (replaced in pairs). How can a Tensorflow/Keras Neural Network be configured to have outputs such that the probabilities of replacing part A 0, 1, 2, and 3 times sum to 1. With similar behavior for part B (probabilities sum to 1).
Simple code like the code below would treat all of the values as coming from the same discrete probability distribution. How can this be modified to create 2 discrete probability distributions in the output:
def baseline_model():
# create model
model = Sequential()
model.add(Dense(8, input_dim=4, activation='relu'))
model.add(Dense(7, activation='softmax'))
# Compile model
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
return model
UPDATE
Based on the comment(s), will something like this work?
References this question
import tensorflow as tf
from tensorflow.keras.layers import Input, Dense, Flatten, Concatenate
from mypackage import get_my_data, compiler_args
data = get_my_data() # obviously, this is a stand-in for however you get your data.
input_layer = Input(data.shape[1:])
hidden = Flatten()(input_layer)
hidden = Dense(192, activation='relu')(hidden)
main_output = Dense(192, activation='relu')(hidden)
# I'm going to build each individual parallel set of layers separately
part_a = Dense(10, activation='relu')(main_output)
output_a = Dense(4, activation='softmax')(part_a) # multi-class classification for part A
part_b = Dense(10, activation='relu')(main_output) # note that it is main_output again
output_b = Dense(3, activation='softmax')(part_b) # multi-class classification for part B
final_output = Concatenate()([output_a, output_b]) # Combine the outputs into final output layer
model = tf.keras.Model(input_layer, final_output)
model.compile(**compiler_args)
model.summary()
I got the following data sample:
[1,2,1,4,5],[1,2,1,4,5],[0,2,7,0,1] with a label of [1,0,1]
....
[1,9,1,4,5],[1,5,1,4,5],[0,7,7,0,1] with a label of [0,1,1]
I can't train it on a single series of [1,2,1,4,5] with a label of 1 or 0, as the whole row got a meaningful context information to it, so the whole 15 input digits should be inferred together.
It's not your typical classification, and it doesn't seem as a regression issue either. Also, the data is not related to imagery, it's taken from a scientific domain.
Obviously I am feeding the data as a flat 15 input node to the net
model = Sequential(
[
Dense(units=16,input_shape = scaled_train_samples[0].shape,activation='relu'),
Dense(units=32,activation='relu'),
Dense(units=3,activation='???'),
])
Which activation output function would be ideal in such case?
I would recommend having 3 outputs to the network. Since the data can affect the 3 "sub-labels", the network only branches apart on the classification layer. If you want, you can add more layers to each specific branch.
I'm assuming that each "sub-label" is binary classification, so that's why I chose sigmoid (returns value from 0 to 1, so larger number means network thinks it's class 1 over class 0)
To do this, you would have to change to the Functional API like this:
from keras.layers import Input, Dense
from keras.models import Model
visible = Input(shape=(scaled_train_samples[0].shape))
model = Dense(16, input_shape = activation='relu')(visible)
model = Dense(32,activation='relu')(model)
model = Dense(16,activation='relu')(model)
out1 = Dense(units=1,activation='sigmoid',name='OUT1')(model)
out2 = Dense(units=1,activation='sigmoid',name='OUT2')(model)
out3 = Dense(units=1,activation='sigmoid',name='OUT3')(model)
finalModel = Model(inputs=visible outputs=[out1, out2, out3])
optimizer = Adam(learning_rate=.0001)
losses = {
'OUT1': 'binary_crossentropy',
'OUT2': 'binary_crossentropy',
'OUT3': 'binary_crossentropy',
}
model.compile(optimizer=optimizer, loss=losses, metrics={'OUT1':'accuracy', 'OUT2':'accuracy', 'OUT3':'accuracy'})
I am working on a stock market prediction project using sentiment analysis. I am trying to create a CNN model where I am passing 4000 days of stock data with a batch size of 100. At the end of the dense layer, I want to add regression layer to get the price of the stock.
def Model(train_data):
input_layer = tf.reshape(tf.cast(train_data, tf.float32), [-1, 1, 100, 2])
conv1 = tf.layers.conv2d(inputs=input_layer,filters=32,kernel_size=[1, 5],padding="same",
activation=tf.nn.relu,strides=1,kernel_initializer=tf.contrib.layers.variance_scaling_initializer())
pool1 = tf.layers.max_pooling2d(inputs=conv1, pool_size=[1, 2], strides=[1,2])
conv2 = tf.layers.conv2d(inputs=pool1,filters=8,kernel_size=[1, 5],padding="same",activation=tf.nn.relu,
strides=1,kernel_initializer=tf.contrib.layers.variance_scaling_initializer())
pool2 = tf.layers.max_pooling2d(inputs=conv2, pool_size=[1, 5], strides=[1,5])
conv3 = tf.layers.conv2d(inputs=pool2,filters=2,kernel_size=[1, 2],padding="same",activation=tf.nn.relu,
strides=1,kernel_initializer=tf.contrib.layers.variance_scaling_initializer())
pool3 = tf.layers.max_pooling2d(inputs=conv3, pool_size=[1, 2], strides=[1, 2])
pool3_flat = tf.reshape(pool3, [40, 1 * 5 * 2])
dense = tf.layers.dense(inputs=pool3_flat, units=5, activation=tf.nn.relu)
dropout = tf.layers.dropout(
inputs=dense, rate=0.2, training=mode == tf.estimator.ModeKeys.TRAIN)
logits = tf.layers.dense(inputs=dropout, units=1)
I am referring https://www.tensorflow.org/tutorials/estimators/cnn for the model, but they are doing classification. Can anybody suggest an approach for regression? The train_data for the model has a shape of [2,4000] where one row is for normalized stock prices and another is for sentiment factor.
The only thing you would have to do would be to add a fully connected layer at the very end, and select a linear activation. Intuitively, this will take the outputs of your Conv layers, and apply y = mx + b to them. Your fully connected output layer would have 40 nodes (one for each output). In fact, you already have one dense layer in that code. If your output is of size 40, just make it 40 instead of 5.
Just a side note, traditionally, CNNs are used for image classification, and only recently did it start migrating to other applications (such as spam detection). I would advise trying a simple feed forward neural network first, and if that does not work, perhaps try a RNN before this.
I'm trying to combine two outputs that are produced by the same network that makes predictions on a 4 class task and a 10 class task. Then I look to combine these outputs to give a length 14 array which I use as my end target.
While this seems to work actively the predictions are always for one class so it produces a probability dist which is only concerned with selecting 1 out of the 14 options instead of 2. What I actually need it to do is to provide 2 predictions, one for each class. I want this all to be produced by the same model.
input = Input(shape=(100, 100), name='input')
lstm = LSTM(128, input_shape=(100, 100)))(input)
output1 = Dense(len(4), activation='softmax', name='output1')(lstm)
output2 = Dense(len(10), activation='softmax', name='output2')(lstm)
output3 = concatenate([output1, output2])
model = Model(inputs=[input], outputs=[output3])
My issue here is determining an appropriate loss function and method of prediction? For prediction I can simply grab the output of each layer after the softmax however I'm unsure how to set the loss function for each of these things to be trained.
Any ideas?
Thanks a lot
You don't need to concatenate the outputs, your model can have two outputs:
input = Input(shape=(100, 100), name='input')
lstm = LSTM(128, input_shape=(100, 100)))(input)
output1 = Dense(len(4), activation='softmax', name='output1')(lstm)
output2 = Dense(len(10), activation='softmax', name='output2')(lstm)
model = Model(inputs=[input], outputs=[output1, output2])
Then to train this model, you typically use two losses that are weighted to produce a single loss:
model.compile(optimizer='sgd', loss=['categorical_crossentropy',
'categorical_crossentropy'], loss_weights=[0.2, 0.8])
Just make sure to format your data right, as now each input sample corresponds to two output labeled samples. For more information check the Functional API Guide.