why MSE will be high when I inverse data - python

I make a regression model to predict force plate using plantar pressure. I am trying to use CNN model in this case. I have 2 different datasets, dataset A (force plate data) and dataset B (plantar pressure data). then before training the data I normalize the data first by using minmaxscaler(). during training I got good results and low MSE. then I tried to test the model and return the predicted data which was normalized to original data using the inverse function. after I inverted the data and I checked the MSE value again, the MSE value I got after inverting was very high. even though when viewed from the prediction results in my opinion very good. why is this MSE value suddenly so high.
the data look like below:
Force Plate Data: [ 4.46733, 4.39629, -34.2351 , -4077.23 , -6206.81 ,
-874.539 ]
Force Plate Data Shape : (15000,6)
SmartInsole Data: [ 0. 0. 0. 13. 1. 0. 0. 0. 0. 0.
15. 92. 60. 0.
36. 0. 0. 0. 0. 0. 0. 62. 80. 58. 37. 0. 0. 0.
0. 40. 83. 72. 32. 22. 0. 0. 0. 0. 0. 0. 98. 108.
74. 56. 30. 17. 0. 0. 44. 121. 127. 83. 0. 0. 0. 0.
0. 3. 83. 64. 63. 63. 77. 70. 43. 55. 115. 138. 144. 137.
0. 0. 0. 0. 66. 107. 127. 146. 150. 52. 0. 0. 0. 129.
133. 18. 0. 0. 0.]
SmartInsole Data Shape : (15000,89)
here my model code:
## Load Data
Insole = pd.read_csv('1225_Rwalk10min1_list.txt', header=None, low_memory=False)
SIData = np.array(Insole)
df = pd.read_csv('1225_Rwalk10min.csv', low_memory=False)
columns = ['Fx','Fy','Fz','Mx','My','Mz']
selected_df = df[columns]
FPDatas = selected_df[:15000]
label = pd.read_csv('label.txt', header=None, low_memory=False)
labelData = np.array(label).astype('float32')
SmartInsole = np.array(SIData[:15000]).astype('float32')
FPData = np.array(FPDatas).astype('float32')
Label = np.array(labelData[:15000]).astype('float32')
SIlabeled = np.concatenate((Label, SmartInsole), axis=1)
SIlabeled = np.array(SIlabeled).astype('float32')
## End Load Data
# Data Normalization
minInsole = SIlabeled.min()
maxInsole = SIlabeled.max()
xscale = (SIlabeled - minInsole) / ( maxInsole - minInsole )
FPmax = []
FPmin = []
yscale = []
for i in range(0,6):
minFP = FPData[:,i].min()
maxFP = FPData[:,i].max()
FPmin.append(minFP)
FPmax.append(maxFP)
FPmin = np.array(FPmin)
FPmax = np.array(FPmax)
for i in range(0,6):
scale = (FPData[:,i] - FPmin[i]) / ( FPmax[i] - FPmin[i] )
yscale.append(scale)
yscale = np.array(yscale)
yscale = yscale.transpose()
#End Data Normalization
# Spliting Data
sample_size = xscale.shape[0]
time_steps = xscale.shape[1]
input_dimension = 1
train_data_reshaped = xscale.reshape(sample_size,time_steps,input_dimension)
X_train, X_test, y_train, y_test = train_test_split(train_data_reshaped, yscale, test_size=0.20, random_state=2)
print(X_train.shape,X_test.shape)
print(y_train.shape,y_test.shape)
#End Spliting Data
#Model Structure
model = Sequential(name="model_conv1D")
n_timesteps = train_data_reshaped.shape[1
n_features = train_data_reshaped.shape[2]
model.add(Input(shape=(n_timesteps,n_features)))
model.add(Conv1D(filters=64, kernel_size=3, padding='same', activation='relu'))
model.add(Conv1D(filters=128, kernel_size=3, padding='same', activation='relu'))
model.add(MaxPooling1D(pool_size=2))
model.add(Conv1D(filters=256, kernel_size=3, padding='same', activation='relu'))
model.add(MaxPooling1D(pool_size=2))
model.add(Flatten())
model.add(Dense(500, activation='relu'))
model.add(Dense(6, activation='sigmoid'))
model.summary()
model.compile(loss='mse', optimizer=Adam(learning_rate=0.002), metrics=['mse'])
history = model.fit(X_train, y_train, batch_size=64, epochs=200,
validation_data=(X_test, y_test), verbose=2)
#End Model Structure
#Evaluate Model
model.evaluate(train_data_reshaped, yscale)
ypred = model.predict(train_data_reshaped)
plt.figure()
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('Model Loss')
plt.ylabel('Loss')
plt.xlabel('epoch')
plt.legend(['train', 'validation'], loc='upper right')
# plt.show()
plt.savefig('Loss Result.png')
print('MSE: ',mean_squared_error(yscale, ypred))
print('RMSE: ',math.sqrt(mean_squared_error(yscale, ypred)))
print('Coefficient of determination (r2 Score): ', r2_score(yscale, ypred))
#Inverse
y_inverse = []
y_pred_inverse = []
for i in range(0,6):
Y_inver = yscale[0:15000, i]*( FPmax[i] - FPmin[i] )+FPmin[i]
Pred_inver = ypred[0:15000, i]*( FPmax[i] - FPmin[i] )+FPmin[i]
y_inverse.append(Y_inver)
y_pred_inverse.append(Pred_inver)
y_inverse = np.array(y_inverse)
y_inverse = y_inverse.transpose()
y_pred_inverse = np.array(y_pred_inverse)
y_pred_inverse = y_pred_inverse.transpose()
print('MSE: ',mean_squared_error(y_inverse, y_pred_inverse))
print('RMSE: ',math.sqrt(mean_squared_error(y_inverse, y_pred_inverse)))
print('Coefficient of determination (r2 Score): ', r2_score(y_inverse, y_pred_inverse))
x=[]
colors=['red','green','brown','teal','gray','black','maroon','orange','purple']
colors2=['green','red','orange','black','maroon','teal','blue','gray','brown']
x = np.arange(0,3000)*60/3000
for i in range(0,6):
plt.figure(figsize=(15,6))
# plt.figure()
plt.plot(x,y_inverse[0:3000,i],color='red')
plt.plot(x,y_pred_inverse[0:3000,i], markerfacecolor='none',color='green')
plt.title('CNN Regression (Training Data)')
if i < 3:
plt.ylabel('Force/'+columns[i])
else:
plt.ylabel('Moment/'+columns[i])
plt.xlabel('Time(s)')
plt.legend(['Real value', 'Predicted Value'], loc='best')
plt.show()
#End Evaluate Model
the model loss:
the mse use the normalize data:
MSE: 0.00033982666
RMSE: 0.018434387873554003
Coefficient of determination (r2 Score): 0.9934412267882915
the mse after inverse data:
MSE: 711726.3
RMSE: 843.6387334042931
Coefficient of determination (r2 Score): 0.9934412272949391
the prediction result samples:
print("Real Value : ",y_inverse[51])
print("prediction Value : ",y_pred_inverse[51])
Real Value : [ 4.46733 4.39629 -34.235107 -4077.2305 -6206.8125
-874.53906 ]
prediction Value : [ 6.6143274 5.6351166 -31.929413 -3412.164 -6177.2344
-2047.6455 ]
how to make MSE value not change when the data is inverted?

Related

How to train regression model with multiple dataset

The datasets I am working with correspond to individual time series signals. Each signal is unique, with differing total number of data points. here I want to simulate dataset A using dataset B.
Spliting Dataset Code:
x = SmartInsole[:,0:178]
y = Avg[:,0]
y = y.reshape(-1,1)
scaler_x = MinMaxScaler()
scaler_y = MinMaxScaler()
scaler_x.fit(x)
xscale = scaler_x.transform(x)
scaler_y.fit(y)
yscale = scaler_y.transform(y)
X_train, X_test, y_train, y_test = train_test_split(xscale, yscale, test_size=0.25, random_state=2)
The dataset after splitting and normalized:
[0.83974359 0.81818182 0.60264901 0.10457516 0. 0.
0. 0. 0. 0.66878981 0.7654321 0.77439024
0.05031447 0.18674699 0. 0. 0. 0.
0.83892617 0.85620915 0.8590604 0.77852349 0.57236842 0.35333333
0. 0. 0. 0.05217391 0.6835443 0.85064935
0.72955975 0.08275862 0. 0. 0. 0.
0. 0.73758865 0.84868421 0.76923077 0.69230769 0.53472222
0.53571429 0.65714286 0.49450549 0.47747748 0.72592593 0.77707006
0.86928105 0.80519481 0.31333333 0. 0.0516129 0.
0. 0. 0. 0.39316239 0.35036496 0.07086614
0.38392857 0.57843137 0.58181818 0.68376068 0.74100719 0.84868421
0.81879195 0.80519481 0.14 0. 0. 0.
0. 0. 0.83802817 0.89189189 0.88811189 0.48979592
0. 0. 0. 0. 0. 0.33793103
0. 0. 0. 0. 0. 0.9929078
0.97222222 0.81118881 0.45890411 0. 0. 0.
0. 0.63551402 0.97810219 0.95172414 0.95205479 0.88356164
0.94630872 0.40384615 0. 0. 0. 0.97222222
0.9862069 0.96478873 0.76510067 0.52 0.24113475 0.
0. 0. 0.21568627 0.88970588 0.94594595 0.89864865
0.08510638 0.37662338 0.0979021 0. 0. 0.
0.46153846 0.92517007 0.74590164 0.48571429 0.05882353 0.19847328
0.11428571 0.07857143 0.11510791 0.56375839 0.80794702 0.87012987
0.81045752 0.21527778 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0.07042254 0.21052632 0.62745098 0.75471698 0.80503145
0.78980892 0. 0. 0. 0. 0.
0. 0.55357143 0.66878981 0.67272727 0.17682927 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. 0. ]
[0.59662633]
(3000, 178)
(3000, 1)
I am working with Keras and trying to fit a resnet50 to the data just to evaluate it.
Below is the my renet model structure:
Below is identity blok:
def identity_block(input_tensor,units):
"""The identity block is the block that has no conv layer at shortcut.
# Arguments
input_tensor: input tensor
units:output shape
# Returns
Output tensor for the block.
"""
x = layers.Dense(units)(input_tensor)
x = layers.BatchNormalization()(x)
x = layers.Activation('relu')(x)
x = layers.Dense(units)(x)
x = layers.BatchNormalization()(x)
x = layers.Activation('relu')(x)
x = layers.Dense(units)(x)
x = layers.BatchNormalization()(x)
x = layers.add([x, input_tensor])
x = layers.Activation('relu')(x)
return x
Below is dens_block:
def dens_block(input_tensor,units):
"""A block that has a dense layer at shortcut.
# Arguments
input_tensor: input tensor
unit: output tensor shape
# Returns
Output tensor for the block.
"""
x = layers.Dense(units)(input_tensor)
x = layers.BatchNormalization()(x)
x = layers.Activation('relu')(x)
x = layers.Dense(units)(x)
x = layers.BatchNormalization()(x)
x = layers.Activation('relu')(x)
x = layers.Dense(units)(x)
x = layers.BatchNormalization()(x)
shortcut = layers.Dense(units)(input_tensor)
shortcut = layers.BatchNormalization()(shortcut)
x = layers.add([x, shortcut])
x = layers.Activation('relu')(x)
return x
Resnet50 model:
def ResNet50Regression():
Res_input = layers.Input(shape=(178,))
width = 16
x = dens_block(Res_input,width)
x = identity_block(x,width)
x = identity_block(x,width)
x = dens_block(x,width)
x = identity_block(x,width)
x = identity_block(x,width)
x = dens_block(x,width)
x = identity_block(x,width)
x = identity_block(x,width)
x = layers.BatchNormalization()(x)
x = layers.Dense(1,activation="linear")(x)
model = models.Model(inputs=Res_input, outputs=x)
return model
Essentially, I am fitting the model to each dataset as follows:
import datetime
from tensorflow.keras import layers,models
model = ResNet50Regression()
model.compile(loss='mse', optimizer=Adam(learning_rate=0.0001), metrics=['mse'])
model.summary()
starttime = datetime.datetime.now()
history = model.fit(X_train, y_train, epochs=200, batch_size=64, verbose=2, validation_data=(X_test, y_test))
endtime = datetime.datetime.now()
How can I get optimal prediction results from the above model
below is my results prediction now:
based on the predictions of the model above, the predictions generated are not able to predict properly. how to make prediction results correspondent the real value

keras, sparse_categorical_crossentropy label Y dimension and value range

Can someone please explain dimensionality logic for input X and class Y
for sparse_categorical_crossentropy loss function ?
I checked both Keras and tf2 doc and examples, and this post.
Cross Entropy vs Sparce but one point is not clear to me.
Does the Y vector need to be expanded to the same number column as
the number classes models outputs (if I use softmax output), or
Does Keras automatically expand Y?
In my case, I have input images 32x32, and Y is a number between 0 and 10.
So the input is (batch_size, h, w), Y (batch_size, 0....10 integer value)
X = (73257, 32, 32)
Y = (73257, 1)
model.fit(X, Y, epochs=30, validation_split=0.10, batch_size=1, verbose=True)
The model itself just a Sequential bunch of Dense layers and output Softmax.
model = Sequential()
model.add(Dense(32, activation='relu',
input_shape=input_shape,
kernel_initializer='he_uniform',
bias_initializer='ones'))
# bunch of Dense layer and output softmax
model.add(Dense(10, activation='softmax'))
The error is dimensionality.
ValueError: Shape mismatch: The shape of labels (received (1, 1)) should equal the shape of logits except for the last dimension (received (1, 32, 10)).
Thank you.
As mentioned in that post, both categorical cross-entropy (cce) and sparse categorical cross-entropy (scc) have the same loss function just except the format of the true label Y. Simply if Y is an integer, you would use scc whereas if Y is one-hot, you would use cce. So for scc, ground truth Y is mostly 1D whereas in cce, ground truth Y mostly is 2D. For ground truth
- (num_of_samples, n_class_one_hot_encode) <- for cce (2D)
- (num_of_samples, n_class_int) <- for scc (1D)
For example, if we use the cifar10 data set, we can do
import tensorflow as tf
(x_train, y_train), (_, _) = tf.keras.datasets.cifar10.load_data()
# train set / data
x_train = x_train.astype('float32') / 255
sparse = y_train
onehot = y_train
onehot = tf.keras.utils.to_categorical(onehot , num_classes=10)
print(sparse[:5]) # < --- (num_of_samples, n_class_int)
print(onehot[:5]) # < --- (num_of_samples, n_class_one_hot_encode)
[[6]
[9]
[9]
[4]
[1]]
[[0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 1.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 1.]
[0. 0. 0. 0. 1. 0. 0. 0. 0. 0.]
[0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]]
Now, let's define a simple model and train using the above both and see what happens.
def net():
input = tf.keras.Input(shape=(32, 32, 3))
x = tf.keras.layers.Conv2D(16, 3, activation="relu")(input)
x = tf.keras.layers.MaxPooling2D(3)(x)
x = tf.keras.layers.GlobalMaxPooling2D()(x)
x = tf.keras.layers.Dense(10, activation='softmax')(x)
model = tf.keras.Model(input, x)
return model
Using cce
model = net()
model.compile(
loss = tf.keras.losses.CategoricalCrossentropy(),
metrics = 'accuracy',
optimizer = 'adam')
his = model.train_on_batch(x_train, onehot, return_dict=True)
print(his)
{'loss': 2.376708984375, 'accuracy': 0.09651999920606613}
one_hot_pred = model.predict(x_train)
print(onehot[0])
print(one_hot_pred[0])
print(onehot[0].shape)
print(one_hot_pred[0].shape)
[0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
[0.1516315 0.1151238 0.11732318 0.10644271 0.08946694 0.1398355
0.05046898 0.04249624 0.11813554 0.06907552]
(10,)
(10,)
Now, using scc
model = net()
model.compile(
loss = tf.keras.losses.SparseCategoricalCrossentropy(),
metrics = 'accuracy',
optimizer = 'adam')
his = model.train_on_batch(x_train, sparse, return_dict=True)
print(his)
{'loss': 2.331458806991577, 'accuracy': 0.10066000372171402}
sparse_pred = model.predict(x_train)
print(sparse[0])
print(sparse_pred[0])
print(sparse[0].shape)
print(sparse_pred[0].shape)
[6]
[0.07184976 0.08837385 0.06910037 0.12347631 0.09542189 0.09981853
0.11247937 0.06707954 0.14902702 0.12337337]
(1,)
(10,)
Observe that, gt and pred shape for scc are (1,) and (10,). In this case, the loss computes the logarithm only for output index which ground truth indicates to. For example, the gt here is 6, and from pred the loss will compute only the logarithm of pred[6]. Here are some little more details of it.

Problem with pooled gradients for class activation map (CAM)

I'm using Keras to create gradient class activation maps to visualize my predictions, but for some of the images that I pass through, it returns zeros for the mean intensity of the gradient over a specific feature map channel. I'm using DenseNet121 to classify between two classes.
I've followed the guide that Francois Chollet wrote to use Grad Cams in his book Deep Learning with Python, and I also had the exact same issue with the VGG16 model as well, so I'm assuming that this is unrelated to the model I choose.
My model is as follows:
K.set_image_dim_ordering('tf')
dnet = DenseNet121(include_top=True, weights='imagenet', input_shape=(img_size, img_size, 3))
dnet.trainable=True
x = dnet.output
x = layers.Dense(100, activation='relu')(x)
x = layers.Dropout(0.4)(x)
x = layers.Dense(2, activation='sigmoid')(x)
model = Model(input=dnet.input, output=x)
optimizer = Adam(lr=0.0001, beta_1=0.9, beta_2=0.999, decay=0.08)
class_weight = {0:cwr, 1:1}
model.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=['accuracy'])
model_hist = model.fit_generator(train, steps_per_epoch = 60, epochs = 2, class_weight = class_weight)
The code for the Grad CAM and associated heatmap:
img = load_img(load_path, target_size=(img_size, img_size))
img_tensor = img_to_array(img)
img_tensor = np.expand_dims(img_tensor, axis=0)
x = preprocess_input(img_tensor)
preds = model.predict(x)
t = np.argmax(preds[0])
img = cv2.imread(str(load_path))
llayer = model.output[:,t]
last_conv_layer = model.get_layer(conv)
grads = K.gradients(llayer, last_conv_layer.output)[0]
pooled_grads = K.mean(grads, axis=(0, 1, 2))
iterate = K.function([model.input],[pooled_grads, last_conv_layer.output[0]])
pooled_grads_value, conv_layer_output_value = iterate([img_tensor])
for i in range(len(pooled_grads_value)):
conv_layer_output_value[:, :, i] *= pooled_grads_value[i]
heatmap = np.mean(conv_layer_output_value, axis=-1)
heatmap = np.maximum(heatmap, 0)
heatmap /= (np.max(heatmap))
heatmap = cv2.resize(heatmap, (img.shape[1], img.shape[0]))
heatmap = np.uint8(255 * heatmap)
heatmap = cv2.applyColorMap(heatmap, cv2.COLORMAP_JET)
cv2.imwrite('colormap.jpg', heatmap)
colormap = cv2.imread('colormap.jpg')
I localized the issue as far as I could to the pooled_grads_value. Printing this value gives
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
Also returns the following error when trying to plot the associated heatmap
/opt/conda/lib/python3.6/site-packages/ipykernel_launcher.py:85: RuntimeWarning: invalid value encountered in true_divide
An example of an image that works properly will return the pooled_grads_value as something similar
[ 3.52818736e-16 -2.74286623e-17 -1.04039105e-16 2.26564966e-15
3.80025990e-16 -4.65492462e-16 -3.13070048e-16 -3.99670802e-16
-4.33913274e-16 -1.11373781e-16 -2.18853726e-16 -3.50514463e-16
1.03881816e-17 -6.30468010e-16 -5.57306545e-16 -1.23719115e-16
-3.93387115e-17 7.59074981e-16 -5.14396333e-16 -1.02742529e-16
-2.16168923e-16 1.81140590e-16 -6.42374594e-16 3.01582507e-17
-5.55568844e-17 -3.05854862e-16 3.26082836e-17 2.35498082e-16
7.86424100e-18 6.45698563e-16 -1.54681729e-16 1.11217808e-16]

how can I calculate the multi-label top k precisions with tensorflow?

My task is to predict the five most probable tags in a sentence. And now I've got unscaled logits from the output(dense connect) layer:
with tf.name_scope("output"):
scores = tf.nn.xw_plus_b(self.h_drop, W,b, name="scores")
predictions = tf.nn.top_k(self.scores, 5) # should be the k highest score
with tf.name_scope("accuracy"):
labels = input_y # its shape is (batch_size, num_classes)
# calculate the top k accuracy
now predictions are just like [3,1,2,50,12] (3,1... are indexes of the highest scores), while labels are in "multi-hot" form: [0,1,0,1,1,0...].
In python, i can simply write
correct_preds = [input_y[i]==1 for i in predictions]
weighted = np.dot(correct_preds, [5,4,3,2,1]) # weighted by rank
recall = sum(correct_preds) /sum(input_y)
precision =sum(correct_preds)/len(correct_preds)
but in tensorflow, what form shoud I use to complete this task?
Solution
I've coded up an example of how to do the calculations. All of the inputs in this example are coded as tf.constant but of course you can substitute your variables.
The main trick is the matrix multiplications. First is input_y reshaped to be 2d times a [1x5] ones matrix called to_top5. The second is correct_preds by the weighted_matrix.
Code
import tensorflow as tf
input_y = tf.constant( [5,2,9,1] , dtype=tf.int32 )
predictions = tf.constant( [[9,3,5,2,1],[8,9,0,6,5],[1,9,3,4,5],[1,2,3,4,5]])
to_top5 = tf.constant( [[1,1,1,1,1]] , dtype=tf.int32 )
input_y_for_top5 = tf.matmul( tf.reshape(input_y,[-1,1]) , to_top5 )
correct_preds = tf.cast( tf.equal( input_y_for_top5 , predictions ) , dtype=tf.float16 )
weighted_matrix = tf.constant( [[5.],[4.],[3.],[2.],[1.]] , dtype=tf.float16 )
weighted = tf.matmul(correct_preds,weighted_matrix)
recall = tf.reduce_sum(correct_preds) / tf.cast( tf.reduce_sum(input_y) , tf.float16)
precision = tf.reduce_sum(correct_preds) / tf.constant(5.0,dtype=tf.float16)
## training
# Run tensorflow and print the result
with tf.Session() as sess:
print "\n\n=============\n\n"
print "\ninput_y_for_top5"
print sess.run(input_y_for_top5)
print "\ncorrect_preds"
print sess.run(correct_preds)
print "\nweighted"
print sess.run(weighted)
print "\nrecall"
print sess.run(recall)
print "\nprecision"
print sess.run(precision)
print "\n\n=============\n\n"
Output
=============
input_y_for_top5
[[5 5 5 5 5]
[2 2 2 2 2]
[9 9 9 9 9]
[1 1 1 1 1]]
correct_preds
[[ 0. 0. 1. 0. 0.]
[ 0. 0. 0. 0. 0.]
[ 0. 1. 0. 0. 0.]
[ 1. 0. 0. 0. 0.]]
weighted
[[ 3.]
[ 0.]
[ 4.]
[ 5.]]
recall
0.17651
precision
0.6001
=============
Summary
The above examples shows a batch size of 4.
The first batch has a y_label of 5, which means that the element with an index of 5 is the correct label for the first batch. Furthermore, the prediction for the first batch is [9,3,5,2,1] which means that the prediction function thinks that the 9th element is the most likely, then element 3 is the next most likely and so on.
Let's say we want an example of a batch size of 3, then use the following code
input_y = tf.constant( [5,2,9] , dtype=tf.int32 )
predictions = tf.constant( [[9,3,5,2,1],[8,9,0,6,5],[1,9,3,4,5]])
If we substitute in the above lines to the program we can see that indeed it calculates everything for a batch size of 3 correctly.
inspired by #wontonimo' answer above, I implemented a method using matrix ops and tf.reshape, tf.gather. The label tensor are "multi-hot", e.g. [[0,1,0,1],[1,0,0,1]]. prediction tensor are obtained by tf.nn.top_k, looks like [[3,1],[0,1]]. Here is the code:
top_k_pred = tf.nn.top_k(logits, 5)
tmp1 = tf.reshape(tf.range(batch_size)*num_classes, (-1,1))
idx_incre = top_k_pred[1] + tf.concat([tmp1]*5,1)
correct_preds = tf.gather(tf.reshape(y_label, (-1,), tf.reshape(idx_incre, (-1,)))
correct_preds = tf.reshape(correct_pred, (batch_size, 5))
weighted = correct_preds * [[5],[4],[3],[2],[1]]

TensorFlow binary classifier outputs predictions for 3 classes instead of 2?

When I print out the predictions, the output includes 3 separate classes 0, 1, and 2 but I only give it 2 separate classes in the training set 0 and 1. I'm not sure why this is happening. I'm trying to elaborate on a tutorial from TensorFlow Machine Learning Cookbook. This is based on the last example of Chapter 2 if anyone has access to it. Note, there are some errors but that may be incompatibility between the older version from the text.
Anyways, I am trying to develop a very rigid structure when building my models so I can get it engrained in muscle memory. I am instantiating the tf.Graph before-hand for each tf.Session of a set of computations and also setting the number of threads to use. Note, I am using TensorFlow 1.0.1 with Python 3.6.1 so the f"formatstring{var}" won't work if you have an older version of Python.
Where I am getting confused is the last step in the prediction under # Accuracy Predictions section. Why am I getting 3 classes for my classification and why is my accuracy so poor for such a simple classification? I am fairly new at this type of model-based machine learning so I'm sure it's some syntax error or assumption I have made. Is there an error in my code?
import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt
import multiprocessing
# Set number of CPU to use
tf_max_threads = tf.ConfigProto(intra_op_parallelism_threads=multiprocessing.cpu_count())
# Data
seed= 0
size = 50
x = np.concatenate((np.random.RandomState(seed).normal(-1,1,size),
np.random.RandomState(seed).normal(2,1,size)
)
)
y = np.concatenate((np.repeat(0, size),
np.repeat(1, size)
)
)
# Containers
loss_data = list()
A_data = list()
# Graph
G_6 = tf.Graph()
n = 25
# Containers
loss_data = list()
A_data = list()
# Iterations
n_iter = 5000
# Train / Test Set
tr_ratio = 0.8
tr_idx = np.random.RandomState(seed).choice(x.size, round(tr_ratio*x.size), replace=False)
te_idx = np.array(list(set(range(x.size)) - set(tr_idx)))
# Build Graph
with G_6.as_default():
# Placeholders
pH_x = tf.placeholder(tf.float32, shape=[None,1], name="pH_x")
pH_y_hat = tf.placeholder(tf.float32, shape=[None,1], name="pH_y_hat")
# Train Set
x_train = x[tr_idx].reshape(-1,1)
y_train = y[tr_idx].reshape(-1,1)
# Test Set
x_test= x[te_idx].reshape(-1,1)
y_test = y[te_idx].reshape(-1,1)
# Model
A = tf.Variable(tf.random_normal(mean=10, stddev=1, shape=[1], seed=seed), name="A")
model = tf.multiply(pH_x, A)
# Loss
loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=model, labels=pH_y_hat))
with tf.Session(graph=G_6, config=tf_max_threads) as sess:
sess.run(tf.global_variables_initializer())
# Optimizer
op = tf.train.GradientDescentOptimizer(0.03)
train_step = op.minimize(loss)
# Train linear model
for i in range(n_iter):
idx_random = np.random.RandomState(i).choice(x_train.size, size=n)
x_tr = x[idx_random].reshape(-1,1)
y_tr = y[idx_random].reshape(-1,1)
sess.run(train_step, feed_dict={pH_x:x_tr, pH_y_hat:y_tr})
# Iterations
A_iter = sess.run(A)[0]
loss_iter = sess.run(loss, feed_dict={pH_x:x_tr, pH_y_hat:y_tr}).mean()
# Append
loss_data.append(loss_iter)
A_data.append(A_iter)
# Log
if (i + 1) % 1000 == 0:
print(f"Step #{i + 1}:\tA = {A_iter}", f"Loss = {to_precision(loss_iter)}", sep="\t")
print()
# Accuracy Predictions
A_result = sess.run(A)
y_ = tf.squeeze(tf.round(tf.nn.sigmoid_cross_entropy_with_logits(logits=model, labels=pH_y_hat)))
correct_predictions = tf.equal(y_, pH_y_hat)
accuracy = tf.reduce_mean(tf.cast(correct_predictions, tf.float32))
print(sess.run(y_, feed_dict={pH_x:x_train, pH_y_hat:y_train}))
print("Training:",
f"Accuracy = {sess.run(accuracy, feed_dict={pH_x:x_train, pH_y_hat:y_train})}",
f"Shape = {x_train.shape}", sep="\t")
print("Testing:",
f"Accuracy = {sess.run(accuracy, feed_dict={pH_x:x_test, pH_y_hat:y_test})}",
f"Shape = {x_test.shape}", sep="\t")
# Plot path
with plt.style.context("seaborn-whitegrid"):
fig, ax = plt.subplots(nrows=3, figsize=(6,6))
pd.Series(loss_data,).plot(ax=ax[0], label="loss", legend=True)
pd.Series(A_data,).plot(ax=ax[1], color="red", label="A", legend=True)
ax[2].hist(x[:size], np.linspace(-5,5), label="class_0", color="red")
ax[2].hist(x[size:], np.linspace(-5,5), label="class_1", color="blue")
alphas = np.linspace(0,0.5, len(A_data))
for i in range(0, len(A_data), 100):
alpha = alphas[i]
a = A_data[i]
ax[2].axvline(a, alpha=alpha, linestyle="--", color="black")
ax[2].legend(loc="upper right")
fig.suptitle("training-process", fontsize=15, y=0.95)
Output Results:
Step #1000: A = 6.72 Loss = 1.13
Step #2000: A = 3.93 Loss = 0.58
Step #3000: A = 2.12 Loss = 0.319
Step #4000: A = 1.63 Loss = 0.331
Step #5000: A = 1.58 Loss = 0.222
[ 0. 0. 1. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 2.
0. 0. 2. 0. 2. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 2. 0. 0. 0. 0. 0. 0. 0. 1. 0.
1. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1.
0. 0. 0. 0. 0. 0. 0. 0.]
Training: Accuracy = 0.475 Shape = (80, 1)
Testing: Accuracy = 0.5 Shape = (20, 1)
Your model doesn't do classification
You have a linear regression model, i.e., your output variable (model = tf.multiply(pH_x, A)) outputs for each input a single scalar value with an arbitrary range. That's generally what you'd have for a prediction model, one that needs to predict some numeric value, not for a classifier.
Afterwards, you treat it like it would contain a typical n-ary classifier output (e.g. by passing it sigmoid_cross_entropy_with_logits) but it does not match the expectations of that function - in that case, the 'shape' of the model variable should be multiple values (e.g. 2 in your case) per input datapoint, each corresponding to some metric corresponding to the probability for each class; then often passed to a softmax function to normalize them.
Alternatively, you may want a binary classifier model that outputs a single value 0 or 1 depending on the class - however, in that case, you want something like the logistic function after the matrix multiplication; and that would need a different loss function, something like simple mean square difference, not sigmoid_cross_entropy_with_logits.
Currently the model as written seems like a mash of two different, incompatible tutorials.

Categories

Resources