How to train regression model with multiple dataset - python

The datasets I am working with correspond to individual time series signals. Each signal is unique, with differing total number of data points. here I want to simulate dataset A using dataset B.
Spliting Dataset Code:
x = SmartInsole[:,0:178]
y = Avg[:,0]
y = y.reshape(-1,1)
scaler_x = MinMaxScaler()
scaler_y = MinMaxScaler()
scaler_x.fit(x)
xscale = scaler_x.transform(x)
scaler_y.fit(y)
yscale = scaler_y.transform(y)
X_train, X_test, y_train, y_test = train_test_split(xscale, yscale, test_size=0.25, random_state=2)
The dataset after splitting and normalized:
[0.83974359 0.81818182 0.60264901 0.10457516 0. 0.
0. 0. 0. 0.66878981 0.7654321 0.77439024
0.05031447 0.18674699 0. 0. 0. 0.
0.83892617 0.85620915 0.8590604 0.77852349 0.57236842 0.35333333
0. 0. 0. 0.05217391 0.6835443 0.85064935
0.72955975 0.08275862 0. 0. 0. 0.
0. 0.73758865 0.84868421 0.76923077 0.69230769 0.53472222
0.53571429 0.65714286 0.49450549 0.47747748 0.72592593 0.77707006
0.86928105 0.80519481 0.31333333 0. 0.0516129 0.
0. 0. 0. 0.39316239 0.35036496 0.07086614
0.38392857 0.57843137 0.58181818 0.68376068 0.74100719 0.84868421
0.81879195 0.80519481 0.14 0. 0. 0.
0. 0. 0.83802817 0.89189189 0.88811189 0.48979592
0. 0. 0. 0. 0. 0.33793103
0. 0. 0. 0. 0. 0.9929078
0.97222222 0.81118881 0.45890411 0. 0. 0.
0. 0.63551402 0.97810219 0.95172414 0.95205479 0.88356164
0.94630872 0.40384615 0. 0. 0. 0.97222222
0.9862069 0.96478873 0.76510067 0.52 0.24113475 0.
0. 0. 0.21568627 0.88970588 0.94594595 0.89864865
0.08510638 0.37662338 0.0979021 0. 0. 0.
0.46153846 0.92517007 0.74590164 0.48571429 0.05882353 0.19847328
0.11428571 0.07857143 0.11510791 0.56375839 0.80794702 0.87012987
0.81045752 0.21527778 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0.07042254 0.21052632 0.62745098 0.75471698 0.80503145
0.78980892 0. 0. 0. 0. 0.
0. 0.55357143 0.66878981 0.67272727 0.17682927 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. 0. ]
[0.59662633]
(3000, 178)
(3000, 1)
I am working with Keras and trying to fit a resnet50 to the data just to evaluate it.
Below is the my renet model structure:
Below is identity blok:
def identity_block(input_tensor,units):
"""The identity block is the block that has no conv layer at shortcut.
# Arguments
input_tensor: input tensor
units:output shape
# Returns
Output tensor for the block.
"""
x = layers.Dense(units)(input_tensor)
x = layers.BatchNormalization()(x)
x = layers.Activation('relu')(x)
x = layers.Dense(units)(x)
x = layers.BatchNormalization()(x)
x = layers.Activation('relu')(x)
x = layers.Dense(units)(x)
x = layers.BatchNormalization()(x)
x = layers.add([x, input_tensor])
x = layers.Activation('relu')(x)
return x
Below is dens_block:
def dens_block(input_tensor,units):
"""A block that has a dense layer at shortcut.
# Arguments
input_tensor: input tensor
unit: output tensor shape
# Returns
Output tensor for the block.
"""
x = layers.Dense(units)(input_tensor)
x = layers.BatchNormalization()(x)
x = layers.Activation('relu')(x)
x = layers.Dense(units)(x)
x = layers.BatchNormalization()(x)
x = layers.Activation('relu')(x)
x = layers.Dense(units)(x)
x = layers.BatchNormalization()(x)
shortcut = layers.Dense(units)(input_tensor)
shortcut = layers.BatchNormalization()(shortcut)
x = layers.add([x, shortcut])
x = layers.Activation('relu')(x)
return x
Resnet50 model:
def ResNet50Regression():
Res_input = layers.Input(shape=(178,))
width = 16
x = dens_block(Res_input,width)
x = identity_block(x,width)
x = identity_block(x,width)
x = dens_block(x,width)
x = identity_block(x,width)
x = identity_block(x,width)
x = dens_block(x,width)
x = identity_block(x,width)
x = identity_block(x,width)
x = layers.BatchNormalization()(x)
x = layers.Dense(1,activation="linear")(x)
model = models.Model(inputs=Res_input, outputs=x)
return model
Essentially, I am fitting the model to each dataset as follows:
import datetime
from tensorflow.keras import layers,models
model = ResNet50Regression()
model.compile(loss='mse', optimizer=Adam(learning_rate=0.0001), metrics=['mse'])
model.summary()
starttime = datetime.datetime.now()
history = model.fit(X_train, y_train, epochs=200, batch_size=64, verbose=2, validation_data=(X_test, y_test))
endtime = datetime.datetime.now()
How can I get optimal prediction results from the above model
below is my results prediction now:
based on the predictions of the model above, the predictions generated are not able to predict properly. how to make prediction results correspondent the real value

Related

why MSE will be high when I inverse data

I make a regression model to predict force plate using plantar pressure. I am trying to use CNN model in this case. I have 2 different datasets, dataset A (force plate data) and dataset B (plantar pressure data). then before training the data I normalize the data first by using minmaxscaler(). during training I got good results and low MSE. then I tried to test the model and return the predicted data which was normalized to original data using the inverse function. after I inverted the data and I checked the MSE value again, the MSE value I got after inverting was very high. even though when viewed from the prediction results in my opinion very good. why is this MSE value suddenly so high.
the data look like below:
Force Plate Data: [ 4.46733, 4.39629, -34.2351 , -4077.23 , -6206.81 ,
-874.539 ]
Force Plate Data Shape : (15000,6)
SmartInsole Data: [ 0. 0. 0. 13. 1. 0. 0. 0. 0. 0.
15. 92. 60. 0.
36. 0. 0. 0. 0. 0. 0. 62. 80. 58. 37. 0. 0. 0.
0. 40. 83. 72. 32. 22. 0. 0. 0. 0. 0. 0. 98. 108.
74. 56. 30. 17. 0. 0. 44. 121. 127. 83. 0. 0. 0. 0.
0. 3. 83. 64. 63. 63. 77. 70. 43. 55. 115. 138. 144. 137.
0. 0. 0. 0. 66. 107. 127. 146. 150. 52. 0. 0. 0. 129.
133. 18. 0. 0. 0.]
SmartInsole Data Shape : (15000,89)
here my model code:
## Load Data
Insole = pd.read_csv('1225_Rwalk10min1_list.txt', header=None, low_memory=False)
SIData = np.array(Insole)
df = pd.read_csv('1225_Rwalk10min.csv', low_memory=False)
columns = ['Fx','Fy','Fz','Mx','My','Mz']
selected_df = df[columns]
FPDatas = selected_df[:15000]
label = pd.read_csv('label.txt', header=None, low_memory=False)
labelData = np.array(label).astype('float32')
SmartInsole = np.array(SIData[:15000]).astype('float32')
FPData = np.array(FPDatas).astype('float32')
Label = np.array(labelData[:15000]).astype('float32')
SIlabeled = np.concatenate((Label, SmartInsole), axis=1)
SIlabeled = np.array(SIlabeled).astype('float32')
## End Load Data
# Data Normalization
minInsole = SIlabeled.min()
maxInsole = SIlabeled.max()
xscale = (SIlabeled - minInsole) / ( maxInsole - minInsole )
FPmax = []
FPmin = []
yscale = []
for i in range(0,6):
minFP = FPData[:,i].min()
maxFP = FPData[:,i].max()
FPmin.append(minFP)
FPmax.append(maxFP)
FPmin = np.array(FPmin)
FPmax = np.array(FPmax)
for i in range(0,6):
scale = (FPData[:,i] - FPmin[i]) / ( FPmax[i] - FPmin[i] )
yscale.append(scale)
yscale = np.array(yscale)
yscale = yscale.transpose()
#End Data Normalization
# Spliting Data
sample_size = xscale.shape[0]
time_steps = xscale.shape[1]
input_dimension = 1
train_data_reshaped = xscale.reshape(sample_size,time_steps,input_dimension)
X_train, X_test, y_train, y_test = train_test_split(train_data_reshaped, yscale, test_size=0.20, random_state=2)
print(X_train.shape,X_test.shape)
print(y_train.shape,y_test.shape)
#End Spliting Data
#Model Structure
model = Sequential(name="model_conv1D")
n_timesteps = train_data_reshaped.shape[1
n_features = train_data_reshaped.shape[2]
model.add(Input(shape=(n_timesteps,n_features)))
model.add(Conv1D(filters=64, kernel_size=3, padding='same', activation='relu'))
model.add(Conv1D(filters=128, kernel_size=3, padding='same', activation='relu'))
model.add(MaxPooling1D(pool_size=2))
model.add(Conv1D(filters=256, kernel_size=3, padding='same', activation='relu'))
model.add(MaxPooling1D(pool_size=2))
model.add(Flatten())
model.add(Dense(500, activation='relu'))
model.add(Dense(6, activation='sigmoid'))
model.summary()
model.compile(loss='mse', optimizer=Adam(learning_rate=0.002), metrics=['mse'])
history = model.fit(X_train, y_train, batch_size=64, epochs=200,
validation_data=(X_test, y_test), verbose=2)
#End Model Structure
#Evaluate Model
model.evaluate(train_data_reshaped, yscale)
ypred = model.predict(train_data_reshaped)
plt.figure()
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('Model Loss')
plt.ylabel('Loss')
plt.xlabel('epoch')
plt.legend(['train', 'validation'], loc='upper right')
# plt.show()
plt.savefig('Loss Result.png')
print('MSE: ',mean_squared_error(yscale, ypred))
print('RMSE: ',math.sqrt(mean_squared_error(yscale, ypred)))
print('Coefficient of determination (r2 Score): ', r2_score(yscale, ypred))
#Inverse
y_inverse = []
y_pred_inverse = []
for i in range(0,6):
Y_inver = yscale[0:15000, i]*( FPmax[i] - FPmin[i] )+FPmin[i]
Pred_inver = ypred[0:15000, i]*( FPmax[i] - FPmin[i] )+FPmin[i]
y_inverse.append(Y_inver)
y_pred_inverse.append(Pred_inver)
y_inverse = np.array(y_inverse)
y_inverse = y_inverse.transpose()
y_pred_inverse = np.array(y_pred_inverse)
y_pred_inverse = y_pred_inverse.transpose()
print('MSE: ',mean_squared_error(y_inverse, y_pred_inverse))
print('RMSE: ',math.sqrt(mean_squared_error(y_inverse, y_pred_inverse)))
print('Coefficient of determination (r2 Score): ', r2_score(y_inverse, y_pred_inverse))
x=[]
colors=['red','green','brown','teal','gray','black','maroon','orange','purple']
colors2=['green','red','orange','black','maroon','teal','blue','gray','brown']
x = np.arange(0,3000)*60/3000
for i in range(0,6):
plt.figure(figsize=(15,6))
# plt.figure()
plt.plot(x,y_inverse[0:3000,i],color='red')
plt.plot(x,y_pred_inverse[0:3000,i], markerfacecolor='none',color='green')
plt.title('CNN Regression (Training Data)')
if i < 3:
plt.ylabel('Force/'+columns[i])
else:
plt.ylabel('Moment/'+columns[i])
plt.xlabel('Time(s)')
plt.legend(['Real value', 'Predicted Value'], loc='best')
plt.show()
#End Evaluate Model
the model loss:
the mse use the normalize data:
MSE: 0.00033982666
RMSE: 0.018434387873554003
Coefficient of determination (r2 Score): 0.9934412267882915
the mse after inverse data:
MSE: 711726.3
RMSE: 843.6387334042931
Coefficient of determination (r2 Score): 0.9934412272949391
the prediction result samples:
print("Real Value : ",y_inverse[51])
print("prediction Value : ",y_pred_inverse[51])
Real Value : [ 4.46733 4.39629 -34.235107 -4077.2305 -6206.8125
-874.53906 ]
prediction Value : [ 6.6143274 5.6351166 -31.929413 -3412.164 -6177.2344
-2047.6455 ]
how to make MSE value not change when the data is inverted?

keras, sparse_categorical_crossentropy label Y dimension and value range

Can someone please explain dimensionality logic for input X and class Y
for sparse_categorical_crossentropy loss function ?
I checked both Keras and tf2 doc and examples, and this post.
Cross Entropy vs Sparce but one point is not clear to me.
Does the Y vector need to be expanded to the same number column as
the number classes models outputs (if I use softmax output), or
Does Keras automatically expand Y?
In my case, I have input images 32x32, and Y is a number between 0 and 10.
So the input is (batch_size, h, w), Y (batch_size, 0....10 integer value)
X = (73257, 32, 32)
Y = (73257, 1)
model.fit(X, Y, epochs=30, validation_split=0.10, batch_size=1, verbose=True)
The model itself just a Sequential bunch of Dense layers and output Softmax.
model = Sequential()
model.add(Dense(32, activation='relu',
input_shape=input_shape,
kernel_initializer='he_uniform',
bias_initializer='ones'))
# bunch of Dense layer and output softmax
model.add(Dense(10, activation='softmax'))
The error is dimensionality.
ValueError: Shape mismatch: The shape of labels (received (1, 1)) should equal the shape of logits except for the last dimension (received (1, 32, 10)).
Thank you.
As mentioned in that post, both categorical cross-entropy (cce) and sparse categorical cross-entropy (scc) have the same loss function just except the format of the true label Y. Simply if Y is an integer, you would use scc whereas if Y is one-hot, you would use cce. So for scc, ground truth Y is mostly 1D whereas in cce, ground truth Y mostly is 2D. For ground truth
- (num_of_samples, n_class_one_hot_encode) <- for cce (2D)
- (num_of_samples, n_class_int) <- for scc (1D)
For example, if we use the cifar10 data set, we can do
import tensorflow as tf
(x_train, y_train), (_, _) = tf.keras.datasets.cifar10.load_data()
# train set / data
x_train = x_train.astype('float32') / 255
sparse = y_train
onehot = y_train
onehot = tf.keras.utils.to_categorical(onehot , num_classes=10)
print(sparse[:5]) # < --- (num_of_samples, n_class_int)
print(onehot[:5]) # < --- (num_of_samples, n_class_one_hot_encode)
[[6]
[9]
[9]
[4]
[1]]
[[0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 1.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 1.]
[0. 0. 0. 0. 1. 0. 0. 0. 0. 0.]
[0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]]
Now, let's define a simple model and train using the above both and see what happens.
def net():
input = tf.keras.Input(shape=(32, 32, 3))
x = tf.keras.layers.Conv2D(16, 3, activation="relu")(input)
x = tf.keras.layers.MaxPooling2D(3)(x)
x = tf.keras.layers.GlobalMaxPooling2D()(x)
x = tf.keras.layers.Dense(10, activation='softmax')(x)
model = tf.keras.Model(input, x)
return model
Using cce
model = net()
model.compile(
loss = tf.keras.losses.CategoricalCrossentropy(),
metrics = 'accuracy',
optimizer = 'adam')
his = model.train_on_batch(x_train, onehot, return_dict=True)
print(his)
{'loss': 2.376708984375, 'accuracy': 0.09651999920606613}
one_hot_pred = model.predict(x_train)
print(onehot[0])
print(one_hot_pred[0])
print(onehot[0].shape)
print(one_hot_pred[0].shape)
[0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
[0.1516315 0.1151238 0.11732318 0.10644271 0.08946694 0.1398355
0.05046898 0.04249624 0.11813554 0.06907552]
(10,)
(10,)
Now, using scc
model = net()
model.compile(
loss = tf.keras.losses.SparseCategoricalCrossentropy(),
metrics = 'accuracy',
optimizer = 'adam')
his = model.train_on_batch(x_train, sparse, return_dict=True)
print(his)
{'loss': 2.331458806991577, 'accuracy': 0.10066000372171402}
sparse_pred = model.predict(x_train)
print(sparse[0])
print(sparse_pred[0])
print(sparse[0].shape)
print(sparse_pred[0].shape)
[6]
[0.07184976 0.08837385 0.06910037 0.12347631 0.09542189 0.09981853
0.11247937 0.06707954 0.14902702 0.12337337]
(1,)
(10,)
Observe that, gt and pred shape for scc are (1,) and (10,). In this case, the loss computes the logarithm only for output index which ground truth indicates to. For example, the gt here is 6, and from pred the loss will compute only the logarithm of pred[6]. Here are some little more details of it.

Problem with pooled gradients for class activation map (CAM)

I'm using Keras to create gradient class activation maps to visualize my predictions, but for some of the images that I pass through, it returns zeros for the mean intensity of the gradient over a specific feature map channel. I'm using DenseNet121 to classify between two classes.
I've followed the guide that Francois Chollet wrote to use Grad Cams in his book Deep Learning with Python, and I also had the exact same issue with the VGG16 model as well, so I'm assuming that this is unrelated to the model I choose.
My model is as follows:
K.set_image_dim_ordering('tf')
dnet = DenseNet121(include_top=True, weights='imagenet', input_shape=(img_size, img_size, 3))
dnet.trainable=True
x = dnet.output
x = layers.Dense(100, activation='relu')(x)
x = layers.Dropout(0.4)(x)
x = layers.Dense(2, activation='sigmoid')(x)
model = Model(input=dnet.input, output=x)
optimizer = Adam(lr=0.0001, beta_1=0.9, beta_2=0.999, decay=0.08)
class_weight = {0:cwr, 1:1}
model.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=['accuracy'])
model_hist = model.fit_generator(train, steps_per_epoch = 60, epochs = 2, class_weight = class_weight)
The code for the Grad CAM and associated heatmap:
img = load_img(load_path, target_size=(img_size, img_size))
img_tensor = img_to_array(img)
img_tensor = np.expand_dims(img_tensor, axis=0)
x = preprocess_input(img_tensor)
preds = model.predict(x)
t = np.argmax(preds[0])
img = cv2.imread(str(load_path))
llayer = model.output[:,t]
last_conv_layer = model.get_layer(conv)
grads = K.gradients(llayer, last_conv_layer.output)[0]
pooled_grads = K.mean(grads, axis=(0, 1, 2))
iterate = K.function([model.input],[pooled_grads, last_conv_layer.output[0]])
pooled_grads_value, conv_layer_output_value = iterate([img_tensor])
for i in range(len(pooled_grads_value)):
conv_layer_output_value[:, :, i] *= pooled_grads_value[i]
heatmap = np.mean(conv_layer_output_value, axis=-1)
heatmap = np.maximum(heatmap, 0)
heatmap /= (np.max(heatmap))
heatmap = cv2.resize(heatmap, (img.shape[1], img.shape[0]))
heatmap = np.uint8(255 * heatmap)
heatmap = cv2.applyColorMap(heatmap, cv2.COLORMAP_JET)
cv2.imwrite('colormap.jpg', heatmap)
colormap = cv2.imread('colormap.jpg')
I localized the issue as far as I could to the pooled_grads_value. Printing this value gives
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
Also returns the following error when trying to plot the associated heatmap
/opt/conda/lib/python3.6/site-packages/ipykernel_launcher.py:85: RuntimeWarning: invalid value encountered in true_divide
An example of an image that works properly will return the pooled_grads_value as something similar
[ 3.52818736e-16 -2.74286623e-17 -1.04039105e-16 2.26564966e-15
3.80025990e-16 -4.65492462e-16 -3.13070048e-16 -3.99670802e-16
-4.33913274e-16 -1.11373781e-16 -2.18853726e-16 -3.50514463e-16
1.03881816e-17 -6.30468010e-16 -5.57306545e-16 -1.23719115e-16
-3.93387115e-17 7.59074981e-16 -5.14396333e-16 -1.02742529e-16
-2.16168923e-16 1.81140590e-16 -6.42374594e-16 3.01582507e-17
-5.55568844e-17 -3.05854862e-16 3.26082836e-17 2.35498082e-16
7.86424100e-18 6.45698563e-16 -1.54681729e-16 1.11217808e-16]

A strange bug in tf.scatter_add when I implement unpool in tensorflow

I'm trying to implement unpool in tensorflow with tf.scatter_add, but I meet a strange bug, here is my code:
import tensorflow as tf
import numpy as np
import random
tf.reset_default_graph()
mat = list(range(64))
random.shuffle(mat)
mat = np.array(mat)
mat = np.reshape(mat, [1,8,8,1])
M = tf.constant(mat, dtype=tf.float32)
pool1, argmax1 = tf.nn.max_pool_with_argmax(M, ksize=[1,2,2,1], strides=[1,2,2,1], padding='SAME')
pool2, argmax2 = tf.nn.max_pool_with_argmax(pool1, ksize=[1,2,2,1], strides=[1,2,2,1], padding='SAME')
pool3, argmax3 = tf.nn.max_pool_with_argmax(pool2, ksize=[1,2,2,1], strides=[1,2,2,1], padding='SAME')
def unpool(x, argmax, strides, unpool_shape=None, batch_size=None, name='unpool'):
x_shape = x.get_shape().as_list()
argmax_shape = argmax.get_shape().as_list()
assert not(x_shape[0] is None and batch_size is None), "must input batch_size if number of batch is alterable"
if x_shape[0] is None:
x_shape[0] = batch_size
if argmax_shape[0] is None:
argmax_shape[0] = x_shape[0]
if unpool_shape is None:
unpool_shape = [x_shape[i] * strides[i] for i in range(4)]
x_unpool = tf.get_variable(name=name, shape=[np.prod(unpool_shape)], initializer=tf.zeros_initializer(), trainable=False)
argmax = tf.cast(argmax, tf.int32)
argmax = tf.reshape(argmax, [np.prod(argmax_shape)])
x = tf.reshape(x, [np.prod(argmax.get_shape().as_list())])
x_unpool = tf.scatter_add(x_unpool , argmax, x)
x_unpool = tf.reshape(x_unpool , unpool_shape)
return x_unpool
unpool2 = unpool(pool3, argmax3, strides=[1,2,2,1], name='unpool3')
unpool1 = unpool(unpool2, argmax2, strides=[1,2,2,1], name='unpool2')
unpool0 = unpool(unpool1, argmax1, strides=[1,2,2,1], name='unpool1')
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
mat_out = mat[:,:,:,0]
pool1_out = sess.run(pool1)[0,:,:,0]
pool2_out = sess.run(pool2)[0,:,:,0]
pool3_out = sess.run(pool3)[0,:,:,0]
argmax1_out = sess.run(argmax1)[0,:,:,0]
argmax2_out = sess.run(argmax2)[0,:,:,0]
argmax3_out = sess.run(argmax3)[0,:,:,0]
unpool2_out = sess.run(unpool2)[0,:,:,0]
unpool1_out = sess.run(unpool1)[0,:,:,0]
unpool0_out = sess.run(unpool0)[0,:,:,0]
print(unpool2_out)
print(unpool1_out)
print(unpool0_out)
output:
[[ 0. 0.]
[ 0. 63.]]
[[ 0. 0. 0. 0.]
[ 0. 0. 0. 0.]
[ 0. 0. 126. 0.]
[ 0. 0. 0. 0.]]
[[ 0. 0. 0. 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 315. 0. 0. 0.]
[ 0. 0. 0. 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0. 0. 0. 0.]]
The location is right, but the value is wrong. unpool2 is right, unpool1 is double of expected value, and unpool2 is quintuple of expected value. I don't know what's wrong, can anyone tell me how to fix this bug?
Great thanks in advance.
In fact, the answer is simple. For convenience, I rename some variables, look this code:
def unpool(x, argmax, strides, unpool_shape=None, batch_size=None, name='unpool'):
x_shape = x.get_shape().as_list()
argmax_shape = argmax.get_shape().as_list()
assert not(x_shape[0] is None and batch_size is None), "must input batch_size if number of batch is alterable"
if x_shape[0] is None:
x_shape[0] = batch_size
if argmax_shape[0] is None:
argmax_shape[0] = x_shape[0]
if unpool_shape is None:
unpool_shape = [x_shape[i] * strides[i] for i in range(4)]
x_unpool = tf.get_variable(name=name, shape=[np.prod(unpool_shape)], initializer=tf.zeros_initializer(), trainable=False)
argmax = tf.cast(argmax, tf.int32)
argmax = tf.reshape(argmax, [np.prod(argmax_shape)])
x = tf.reshape(x, [np.prod(argmax.get_shape().as_list())])
x_unpool_add = tf.scatter_add(x_unpool , argmax, x)
x_unpool_reshape = tf.reshape(x_unpool_add , unpool_shape)
return x_unpool_reshape
x_unpool_add is a op of tf.scatter_add, everytime we compute x_unpool_reshape, x_unpool_add will be called. So x_unpool will add x twice if we compute unpool2 twice. In my origin code, I compute unpool0, unpool1, unpool2 in order, x_unpool_add of unpool1 is called firstly, then when we compute unpool2, because of we need compute unpool1, x_unpool_add will be called again, so it's equal to call x_unpool_add twice, the value is wrong. If we compute unpool2 directly, we will get right result. So replacling tf.scatter_add with tf.scatter_update can avoid this bug.
This code can reproducible this intuitively:
import tensorflow as tf
t1 = tf.get_variable(name='t1', shape=[1], dtype=tf.float32, initializer=tf.zeros_initializer())
t2 = tf.get_variable(name='t2', shape=[1], dtype=tf.float32, initializer=tf.zeros_initializer())
d = tf.scatter_add(t1, [0], [1])
e = tf.scatter_add(t2, [0], d)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
d_out1 = sess.run(d)
d_out2 = sess.run(d)
e_out = sess.run(e)
print(d_out1)
print(d_out2)
print(e_out)
output:
[1.]
[2.]
[3.]
Use tf.scatter_update can avoid this.
import tensorflow as tf
import numpy as np
import random
tf.reset_default_graph()
mat = list(range(64))
random.shuffle(mat)
mat = np.array(mat)
mat = np.reshape(mat, [1,8,8,1])
M = tf.constant(mat, dtype=tf.float32)
pool1, argmax1 = tf.nn.max_pool_with_argmax(M, ksize=[1,2,2,1], strides=[1,2,2,1], padding='SAME')
pool2, argmax2 = tf.nn.max_pool_with_argmax(pool1, ksize=[1,2,2,1], strides=[1,2,2,1], padding='SAME')
pool3, argmax3 = tf.nn.max_pool_with_argmax(pool2, ksize=[1,2,2,1], strides=[1,2,2,1], padding='SAME')
def unpool(x, argmax, strides, unpool_shape=None, batch_size=None, name='unpool'):
x_shape = x.get_shape().as_list()
argmax_shape = argmax.get_shape().as_list()
assert not(x_shape[0] is None and batch_size is None), "must input batch_size if number of batch is alterable"
if x_shape[0] is None:
x_shape[0] = batch_size
if argmax_shape[0] is None:
argmax_shape[0] = x_shape[0]
if unpool_shape is None:
unpool_shape = [x_shape[i] * strides[i] for i in range(4)]
unpool = tf.get_variable(name=name, shape=[np.prod(unpool_shape)], initializer=tf.zeros_initializer(), trainable=False)
argmax = tf.cast(argmax, tf.int32)
argmax = tf.reshape(argmax, [np.prod(argmax_shape)])
x = tf.reshape(x, [np.prod(argmax.get_shape().as_list())])
unpool = tf.scatter_update(unpool, argmax, x)
unpool = tf.reshape(unpool, unpool_shape)
return unpool
unpool2 = unpool(pool3, argmax3, strides=[1,2,2,1], name='unpool3')
unpool1 = unpool(unpool2, argmax2, strides=[1,2,2,1], name='unpool2')
unpool0 = unpool(unpool1, argmax1, strides=[1,2,2,1], name='unpool1')
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
mat_out = mat[:,:,:,0]
pool1_out = sess.run(pool1)[0,:,:,0]
pool2_out = sess.run(pool2)[0,:,:,0]
pool3_out = sess.run(pool3)[0,:,:,0]
argmax1_out = sess.run(argmax1)[0,:,:,0]
argmax2_out = sess.run(argmax2)[0,:,:,0]
argmax3_out = sess.run(argmax3)[0,:,:,0]
unpool2_out = sess.run(unpool2)[0,:,:,0]
unpool1_out = sess.run(unpool1)[0,:,:,0]
unpool0_out = sess.run(unpool0)[0,:,:,0]
print(unpool2_out)
print(unpool1_out)
print(unpool0_out)
output:
[[ 0. 0.]
[ 0. 63.]]
[[ 0. 0. 0. 0.]
[ 0. 0. 0. 0.]
[ 0. 0. 0. 63.]
[ 0. 0. 0. 0.]]
[[ 0. 0. 0. 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0. 0. 0. 63.]
[ 0. 0. 0. 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0. 0. 0. 0.]]

TensorFlow binary classifier outputs predictions for 3 classes instead of 2?

When I print out the predictions, the output includes 3 separate classes 0, 1, and 2 but I only give it 2 separate classes in the training set 0 and 1. I'm not sure why this is happening. I'm trying to elaborate on a tutorial from TensorFlow Machine Learning Cookbook. This is based on the last example of Chapter 2 if anyone has access to it. Note, there are some errors but that may be incompatibility between the older version from the text.
Anyways, I am trying to develop a very rigid structure when building my models so I can get it engrained in muscle memory. I am instantiating the tf.Graph before-hand for each tf.Session of a set of computations and also setting the number of threads to use. Note, I am using TensorFlow 1.0.1 with Python 3.6.1 so the f"formatstring{var}" won't work if you have an older version of Python.
Where I am getting confused is the last step in the prediction under # Accuracy Predictions section. Why am I getting 3 classes for my classification and why is my accuracy so poor for such a simple classification? I am fairly new at this type of model-based machine learning so I'm sure it's some syntax error or assumption I have made. Is there an error in my code?
import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt
import multiprocessing
# Set number of CPU to use
tf_max_threads = tf.ConfigProto(intra_op_parallelism_threads=multiprocessing.cpu_count())
# Data
seed= 0
size = 50
x = np.concatenate((np.random.RandomState(seed).normal(-1,1,size),
np.random.RandomState(seed).normal(2,1,size)
)
)
y = np.concatenate((np.repeat(0, size),
np.repeat(1, size)
)
)
# Containers
loss_data = list()
A_data = list()
# Graph
G_6 = tf.Graph()
n = 25
# Containers
loss_data = list()
A_data = list()
# Iterations
n_iter = 5000
# Train / Test Set
tr_ratio = 0.8
tr_idx = np.random.RandomState(seed).choice(x.size, round(tr_ratio*x.size), replace=False)
te_idx = np.array(list(set(range(x.size)) - set(tr_idx)))
# Build Graph
with G_6.as_default():
# Placeholders
pH_x = tf.placeholder(tf.float32, shape=[None,1], name="pH_x")
pH_y_hat = tf.placeholder(tf.float32, shape=[None,1], name="pH_y_hat")
# Train Set
x_train = x[tr_idx].reshape(-1,1)
y_train = y[tr_idx].reshape(-1,1)
# Test Set
x_test= x[te_idx].reshape(-1,1)
y_test = y[te_idx].reshape(-1,1)
# Model
A = tf.Variable(tf.random_normal(mean=10, stddev=1, shape=[1], seed=seed), name="A")
model = tf.multiply(pH_x, A)
# Loss
loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=model, labels=pH_y_hat))
with tf.Session(graph=G_6, config=tf_max_threads) as sess:
sess.run(tf.global_variables_initializer())
# Optimizer
op = tf.train.GradientDescentOptimizer(0.03)
train_step = op.minimize(loss)
# Train linear model
for i in range(n_iter):
idx_random = np.random.RandomState(i).choice(x_train.size, size=n)
x_tr = x[idx_random].reshape(-1,1)
y_tr = y[idx_random].reshape(-1,1)
sess.run(train_step, feed_dict={pH_x:x_tr, pH_y_hat:y_tr})
# Iterations
A_iter = sess.run(A)[0]
loss_iter = sess.run(loss, feed_dict={pH_x:x_tr, pH_y_hat:y_tr}).mean()
# Append
loss_data.append(loss_iter)
A_data.append(A_iter)
# Log
if (i + 1) % 1000 == 0:
print(f"Step #{i + 1}:\tA = {A_iter}", f"Loss = {to_precision(loss_iter)}", sep="\t")
print()
# Accuracy Predictions
A_result = sess.run(A)
y_ = tf.squeeze(tf.round(tf.nn.sigmoid_cross_entropy_with_logits(logits=model, labels=pH_y_hat)))
correct_predictions = tf.equal(y_, pH_y_hat)
accuracy = tf.reduce_mean(tf.cast(correct_predictions, tf.float32))
print(sess.run(y_, feed_dict={pH_x:x_train, pH_y_hat:y_train}))
print("Training:",
f"Accuracy = {sess.run(accuracy, feed_dict={pH_x:x_train, pH_y_hat:y_train})}",
f"Shape = {x_train.shape}", sep="\t")
print("Testing:",
f"Accuracy = {sess.run(accuracy, feed_dict={pH_x:x_test, pH_y_hat:y_test})}",
f"Shape = {x_test.shape}", sep="\t")
# Plot path
with plt.style.context("seaborn-whitegrid"):
fig, ax = plt.subplots(nrows=3, figsize=(6,6))
pd.Series(loss_data,).plot(ax=ax[0], label="loss", legend=True)
pd.Series(A_data,).plot(ax=ax[1], color="red", label="A", legend=True)
ax[2].hist(x[:size], np.linspace(-5,5), label="class_0", color="red")
ax[2].hist(x[size:], np.linspace(-5,5), label="class_1", color="blue")
alphas = np.linspace(0,0.5, len(A_data))
for i in range(0, len(A_data), 100):
alpha = alphas[i]
a = A_data[i]
ax[2].axvline(a, alpha=alpha, linestyle="--", color="black")
ax[2].legend(loc="upper right")
fig.suptitle("training-process", fontsize=15, y=0.95)
Output Results:
Step #1000: A = 6.72 Loss = 1.13
Step #2000: A = 3.93 Loss = 0.58
Step #3000: A = 2.12 Loss = 0.319
Step #4000: A = 1.63 Loss = 0.331
Step #5000: A = 1.58 Loss = 0.222
[ 0. 0. 1. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 2.
0. 0. 2. 0. 2. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 2. 0. 0. 0. 0. 0. 0. 0. 1. 0.
1. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1.
0. 0. 0. 0. 0. 0. 0. 0.]
Training: Accuracy = 0.475 Shape = (80, 1)
Testing: Accuracy = 0.5 Shape = (20, 1)
Your model doesn't do classification
You have a linear regression model, i.e., your output variable (model = tf.multiply(pH_x, A)) outputs for each input a single scalar value with an arbitrary range. That's generally what you'd have for a prediction model, one that needs to predict some numeric value, not for a classifier.
Afterwards, you treat it like it would contain a typical n-ary classifier output (e.g. by passing it sigmoid_cross_entropy_with_logits) but it does not match the expectations of that function - in that case, the 'shape' of the model variable should be multiple values (e.g. 2 in your case) per input datapoint, each corresponding to some metric corresponding to the probability for each class; then often passed to a softmax function to normalize them.
Alternatively, you may want a binary classifier model that outputs a single value 0 or 1 depending on the class - however, in that case, you want something like the logistic function after the matrix multiplication; and that would need a different loss function, something like simple mean square difference, not sigmoid_cross_entropy_with_logits.
Currently the model as written seems like a mash of two different, incompatible tutorials.

Categories

Resources