Related
After getting responded to this question, I realized that I have a different question.
I would like to have a different objective component based on the batch that I am passing during a training step. Suppose my batch size is one and I associate each training data with two supporter vectors that are not part of the training step. So I need to figure out which part of the input vector is currently being processed.
import numpy as np
import keras.backend as K
from keras.layers import Dense, Input
from keras.models import Model
features = np.random.rand(100, 5)
labels = np.random.rand(100, 2)
holder = np.random.rand(200, 5) # each feature gets two supporter.
iter = np.arange(start=1, stop=features.shape[0], step=1)
supporters = {}
for i,j in zip(iter, holder): #(i, i+1) represent the ith training data
supporters[i]=j
For instance, the first two rows of supporters is for the first point in feature.
features[0] [0.71444629 0.77256729 0.95375736 0.18759234 0.8207317 ]
has the following two supporters.
1: array([0.76281692, 0.18698215, 0.11687052, 0.78084761, 0.10293403]),
2: array([0.98229912, 0.08784577, 0.08109571, 0.23665783, 0.52587238])
Now, I create a simple model.
# Simple neural net with three outputs
input_layer = Input((5,))
hidden_layer = Dense(16)(input_layer)
output_layer = Dense(2)(hidden_layer)
# Model
model = Model(inputs=input_layer, outputs=output_layer)
My goal is to create a loss function as
def custom_loss(y_true, y_pred):
# Normal MSE loss
mse = K.mean(K.square(y_true-y_pred), axis=-1)
#Assume that I properly pass model object into the method use the predict method
#to use the current network weights
new_constraint = K.sum(y_pred - model.predict(supporters))
return(mse+new_constraint)
Then, I go ahead and compile my model.
model.compile(loss=custom_loss, optimizer='sgd')
model.fit(features, labels, epochs=1, ,batch_size=1)
The problem is that since the batch size is one, I want to make sure that the loss function only considers the supporter of the current training input. For example, if I am training the third point in features, then I want to use the fifth and sixth vectors while creating new_constraint. How can I accomplish this?
You can implement it like this (I have used the TensorFlow based Keras api but it shouldn't matter)
import numpy as np
import tensorflow as tf
from tensorflow.keras import Input, layers, Model
from tensorflow.keras import backend as K
features = np.random.rand(100, 5)
labels = np.random.rand(100, 2)
supporters = np.random.rand(200, 5) # each feature gets two supporter.
# I will get both support vectors to iterate over
supporters_1 = supporters[::2, :]
supporters_2 = supporters[1::2, :]
print(supporters_1.shape, supporters_2.shape)
# Result -> ((100, 5), (100, 5))
# Create a tf dataset to use in training
dataset = tf.data.Dataset.from_tensor_slices(((features, supporters_1, supporters_2), labels)).batch(1)
# A look at what it returns
for i in dataset:
print(i)
break
'''
Result:
((<tf.Tensor: shape=(1, 5), dtype=float64, numpy=array([[0.42834492, 0.01041871, 0.53058175, 0.69453215, 0.83901092]])>,
<tf.Tensor: shape=(1, 5), dtype=float64, numpy=array([[0.1724601 , 0.14386688, 0.49018201, 0.13565471, 0.35159235]])>,
<tf.Tensor: shape=(1, 5), dtype=float64, numpy=array([[0.87243349, 0.98779049, 0.98405784, 0.74069913, 0.25763667]])>),
<tf.Tensor: shape=(1, 2), dtype=float64, numpy=array([[0.20993531, 0.70153453]])>)
'''
#=========================================================
# Creating the model (Input size is 5 and not 2 in your sample so I changed it)
# Same for the label shape
input_layer = Input((5,))
hidden_layer = layers.Dense(16)(input_layer)
output_layer = layers.Dense(2)(hidden_layer)
# Model
model = Model(inputs=input_layer, outputs=output_layer)
#=========================================================
# Implementing the custom loss
# Without the `K.abs` the result can be negative and hence the `K.abs`
def custom_loss(y_true, y_pred, support_pred_1, support_pred_2):
mse = tf.keras.losses.mse(y_true, y_pred)
new_constraint = K.abs(K.sum(y_pred - [support_pred_1, support_pred_2]))
return (mse+new_constraint)
# Instantiate an optimizer.
optimizer = tf.keras.optimizers.SGD(learning_rate=1e-3)
'''
Now we create a custom training loop. In this we will get the logits
of all the inputs and then compute loss using the custom loss
function and then optimize on that loss.
'''
epochs = 10
for epoch in range(epochs):
print("Start of epoch %d" % (epoch,))
for step, ((features, support_1, support_2), labels) in enumerate(dataset):
with tf.GradientTape() as tape:
logits = model(features, training=True)
logits_1 = model(support_1, training=True)
logits_2 = model(support_2, training=True)
loss_value = custom_loss(labels, logits, logits_1, logits_2)
grads = tape.gradient(loss_value, model.trainable_weights)
optimizer.apply_gradients(zip(grads, model.trainable_weights))
print('loss_value: ', loss_value)
EDIT: There is another way to do this. As below:
# Everthing same till the supporters_1, supporters_2
def combine(inputs, targets):
features = inputs[0]
supports1 = inputs[1]
supports2 = inputs[2]
# Stack the inputs as a batch
final = tf.stack((features, support_1, support_2))
final = tf.reshape(final, (3,5))
return final, targets
# Creating the dataset
dataset = tf.data.Dataset.from_tensor_slices(((features, supporters_1, supporters_2), labels)).batch(1)
dataset = dataset.map(combine, num_parallel_calls=-1)
# Check the output
for i in dataset:
print(i)
break
'''
(<tf.Tensor: shape=(3, 5), dtype=float64, numpy=
array([[0.35641985, 0.93025517, 0.72874829, 0.81810538, 0.46682277],
[0.95497516, 0.71722253, 0.10608685, 0.37267656, 0.94748968],
[0.04822454, 0.00480376, 0.08479184, 0.51133809, 0.38242403]])>, <tf.Tensor: shape=(1, 2), dtype=float64, numpy=array([[0.21399956, 0.97149716]])>)
'''
#================MODEL=================
input_layer = Input((5,))
hidden_layer = layers.Dense(16)(input_layer)
output_layer = layers.Dense(2)(hidden_layer)
# Model
model = Model(inputs=input_layer, outputs=output_layer)
#=======================================
# change the loss function accordingly
'''
The first row in the y_pred will be the prediction corresponding to
actual features and the rest will be predictions corresponding to
supports and hence you can change the loss function as below.
'''
def custom_loss(y_true, y_pred):
mse = tf.keras.losses.mse(y_true, y_pred[0, :])
new_constraint = K.abs(K.sum(y_pred[0, :] - y_pred[1:, :]))
return (mse+new_constraint)
# Compile
model.compile(loss=custom_loss, optimizer='adam')
# train
model.fit(dataset, epochs=5)
I was playing around with tf.keras and ran some predict() method on two Model objects with the same weights initialization.
import numpy as np
import tensorflow as tf
from tensorflow.keras.layers import LSTM, Masking, Input, Embedding, Dense
from tensorflow.keras.models import Model
tf.enable_eager_execution()
np.random.seed(10)
X = np.asarray([
[0, 1, 2, 3, 3],
[0, 0, 1, 1, 1],
[0, 0, 0, 1, 1],
])
y = [
0,
1,
1
]
seq_len = X.shape[1]
inp = Input(shape=[seq_len])
emb = Embedding(4, 10, name='embedding')(inp)
x = emb
x = LSTM(5, return_sequences=False, name='lstm')(x)
out = Dense(1, activation='sigmoid', name='out')(x)
model = Model(inputs=inp, outputs=out)
model.summary()
preds = model.predict(X)
inp = Input(shape=[seq_len])
emb = Embedding(4, 10, name='embedding', weights=model.get_layer('embedding').get_weights()[0])(inp)
x = emb
x = LSTM(5, return_sequences=False, weights=model.get_layer('lstm').get_weights()[0])(x)
out = Dense(1, activation='sigmoid', weights=model.get_layer('out').get_weights()[0])(x)
model_2 = Model(inputs=inp, outputs=out)
model_2.summary()
preds_2 = model_2.predict(X)
print(preds, preds_2)
I am not sure why but the results of the two predictions are different. I got these when I ran the print function. You might get something different.
[[0.5027414 ]
[0.5019673 ]
[0.50134844]] [[0.5007331]
[0.5002397]
[0.4996575]]
I am trying to understand how keras works. Any explanation would be appreciated. Thank you.
NOTE: THERE IS NO LEARNING INVOLVED HERE. I don't get the idea where the randomness comes from.
Try to change the optimizer from adam to SGD or something else. I noticed that with the same model I used to get different results and it fixed the problem. Also, take a look at the here to fix the initial weights. By the way, I don't know why and how the optimizer can affect the results in the test time with the same model.
It is that you are not copying all the weights. I have no idea why your call mechanically works but it is really easy to see you are not by examining the get_weights without the [0] indexing.
e.g.these are not copied:
model.get_layer('lstm').get_weights()[1]
array([[ 0.11243069, -0.1028666 , 0.01080172, -0.07471965, 0.05566487,
-0.12818974, 0.34882438, -0.17163819, -0.21306667, 0.5386005 ,
-0.03643916, 0.03835883, -0.31128728, 0.04882491, -0.05503649,
-0.22660127, -0.4683674 , -0.00415642, -0.29038426, -0.06893865],
[-0.5117522 , 0.01057898, -0.23182054, 0.03220385, 0.21614116,
0.0732751 , -0.30829042, 0.06233712, -0.54017985, -0.1026137 ,
-0.18011908, 0.15880923, -0.21900705, -0.11910527, -0.03808065,
0.07623457, -0.13157862, -0.18740109, 0.06135096, -0.21589288],
[-0.2295578 , -0.12452635, -0.08739456, -0.1880849 , 0.2220488 ,
-0.14575425, 0.32249492, 0.05235165, -0.09479579, 0.2496742 ,
0.10411342, -0.0263749 , 0.33186644, -0.1838699 , 0.28964192,
-0.2414586 , 0.41612682, 0.13791762, 0.13942356, -0.36176005],
[-0.14428475, -0.02090888, 0.27968913, 0.09452424, 0.1291543 ,
-0.43372717, -0.11366601, 0.37842247, 0.3320751 , 0.21959782,
-0.4242381 , 0.02412989, -0.24809352, 0.2508208 , -0.06223384,
0.08648364, 0.17311276, -0.05988384, 0.02276517, -0.1473657 ],
[ 0.28600952, -0.37206012, 0.21376705, -0.16566195, 0.0833357 ,
-0.00887177, 0.01394618, 0.5345957 , -0.25116244, -0.17159337,
0.096329 , -0.32286254, 0.02044407, -0.1393016 , -0.0767666 ,
0.1505355 , -0.28456056, 0.16909163, 0.16806729, -0.14622769]],
dtype=float32)
but also if you name the lstm layer in model 2 you can see there are not equal parts of the weights.
model_2.get_layer("lstm").get_weights()[1] - model.get_layer("lstm").get_weights()[1]
Perhaps, setting numpy seed is not enough to make the operations and weights deterministic. Tensorflow documentation suggests that to have deterministic weights, you should rather run
tf.keras.utils.set_random_seed(1)
tf.config.experimental.enable_op_determinism()
https://www.tensorflow.org/api_docs/python/tf/config/experimental/enable_op_determinism#:~:text=Configures%20TensorFlow%20ops%20to%20run%20deterministically.&text=When%20op%20determinism%20is%20enabled,is%20useful%20for%20debugging%20models.
Could you check if it helps? (your code seems to be written in version 1 of TF, so it does not run on my v2 setup without adaptation)
The thing about machine learning is that it doesn't always learn quite the same way. It involves lots of probabilities, so on a larger scale the results will tend to converge towards one value, but individual runs can and will give varying results.
More info here
It is absolutely normal that the many runs with the same input data
give different output. It is mainly due to the internal stochasticity
of such machine learning techniques (example: ANN, Decision Trees
building algorithms, etc.).
- Nabil Belgasmi, Université de la Manouba
There is not a specific method or technique. The results and
evaluation of the performance depends on several factors: the data
type, parameters of induction function, training set (supervised),
etc. What is important is to compare the results of using metric
measurements such as recall, precision, F_measure, ROC curves or other
graphical methods.
- Jésus Antonio Motta Laval University
EDIT
The predict() function takes an array of one or more data instances.
The example below demonstrates how to make regression predictions on multiple data instances with an unknown expected outcome.
# example of making predictions for a regression problem
from keras.models import Sequential
from keras.layers import Dense
from sklearn.datasets import make_regression
from sklearn.preprocessing import MinMaxScaler
# generate regression dataset
X, y = make_regression(n_samples=100, n_features=2, noise=0.1, random_state=1)
scalarX, scalarY = MinMaxScaler(), MinMaxScaler()
scalarX.fit(X)
scalarY.fit(y.reshape(100,1))
X = scalarX.transform(X)
y = scalarY.transform(y.reshape(100,1))
# define and fit the final model
model = Sequential()
model.add(Dense(4, input_dim=2, activation='relu'))
model.add(Dense(4, activation='relu'))
model.add(Dense(1, activation='linear'))
model.compile(loss='mse', optimizer='adam')
model.fit(X, y, epochs=1000, verbose=0)
# new instances where we do not know the answer
Xnew, a = make_regression(n_samples=3, n_features=2, noise=0.1, random_state=1)
Xnew = scalarX.transform(Xnew)
# make a prediction
ynew = model.predict(Xnew)
# show the inputs and predicted outputs
for i in range(len(Xnew)):
print("X=%s, Predicted=%s" % (Xnew[i], ynew[i]))
Running the example makes multiple predictions, then prints the inputs and predictions side by side for review.
X=[0.29466096 0.30317302], Predicted=[0.17097184]
X=[0.39445118 0.79390858], Predicted=[0.7475489]
X=[0.02884127 0.6208843 ], Predicted=[0.43370453]
SOURCE
Disclaimer: The predict() function itself is slightly random (probabilistic)
I am doing a slight modification of a standard neural network by defining a custom loss function. The custom loss function depends not only on y_true and y_pred, but also on the training data. I implemented it using the wrapping solution described here.
Specifically, I wanted to define a custom loss function that is the standard mse plus the mse between the input and the square of y_pred:
def custom_loss(x_true)
def loss(y_true, y_pred):
return K.mean(K.square(y_pred - y_true) + K.square(y_true - x_true))
return loss
Then I compile the model using
model_custom.compile(loss = custom_loss( x_true=training_data ), optimizer='adam')
fit the model using
model_custom.fit(training_data, training_label, epochs=100, batch_size = training_data.shape[0])
All of the above works fine, because the batch size is actually the number of all the training samples.
But if I set a different batch_size (e.g., 10) when I have 1000 training samples, there will be an error
Incompatible shapes: [1000] vs. [10].
It seems that Keras is able to automatically adjust the size of the inputs to its own loss function base on the batch size, but cannot do so for the custom loss function.
Do you know how to solve this issue?
Thank you!
==========================================================================
* Update: the batch size issue is solved, but another issue occurred
Thank you, Ori, for the suggestion of concatenating the input and output layers! It "worked", in the sense that the codes can run under any batch size. However, it seems that the result from training the new model is wrong... Below is a simplified version of the codes to demonstrate the problem:
import numpy as np
import scipy.io
import keras
from keras import backend as K
from keras.models import Model
from keras.layers import Input, Dense, Activation
from numpy.random import seed
from tensorflow import set_random_seed
def custom_loss(y_true, y_pred): # this is essentially the mean_square_error
mse = K.mean( K.square( y_pred[:,2] - y_true ) )
return mse
# set the seeds so that we get the same initialization across different trials
seed_numpy = 0
seed_tensorflow = 0
# generate data of x = [ y^3 y^2 ]
y = np.random.rand(5000+1000,1) * 2 # generate 5000 training and 1000 testing samples
x = np.concatenate( ( np.power(y, 3) , np.power(y, 2) ) , axis=1 )
training_data = x[0:5000:1,:]
training_label = y[0:5000:1]
testing_data = x[5000:6000:1,:]
testing_label = y[5000:6000:1]
# build the standard neural network with one hidden layer
seed(seed_numpy)
set_random_seed(seed_tensorflow)
input_standard = Input(shape=(2,)) # input
hidden_standard = Dense(10, activation='relu', input_shape=(2,))(input_standard) # hidden layer
output_standard = Dense(1, activation='linear')(hidden_standard) # output layer
model_standard = Model(inputs=[input_standard], outputs=[output_standard]) # build the model
model_standard.compile(loss='mean_squared_error', optimizer='adam') # compile the model
model_standard.fit(training_data, training_label, epochs=50, batch_size = 500) # train the model
testing_label_pred_standard = model_standard.predict(testing_data) # make prediction
# get the mean squared error
mse_standard = np.sum( np.power( testing_label_pred_standard - testing_label , 2 ) ) / 1000
# build the neural network with the custom loss
seed(seed_numpy)
set_random_seed(seed_tensorflow)
input_custom = Input(shape=(2,)) # input
hidden_custom = Dense(10, activation='relu', input_shape=(2,))(input_custom) # hidden layer
output_custom_temp = Dense(1, activation='linear')(hidden_custom) # output layer
output_custom = keras.layers.concatenate([input_custom, output_custom_temp])
model_custom = Model(inputs=[input_custom], outputs=[output_custom]) # build the model
model_custom.compile(loss = custom_loss, optimizer='adam') # compile the model
model_custom.fit(training_data, training_label, epochs=50, batch_size = 500) # train the model
testing_label_pred_custom = model_custom.predict(testing_data) # make prediction
# get the mean squared error
mse_custom = np.sum( np.power( testing_label_pred_custom[:,2:3:1] - testing_label , 2 ) ) / 1000
# compare the result
print( [ mse_standard , mse_custom ] )
Basically, I have a standard one-hidden-layer neural network, and a custom one-hidden-layer neural network whose output layer is concatenated with the input layer. For testing purpose, I did not use the concatenated input layer in the custom loss function, because I wanted to see if the custom network can reproduce the standard neural network. Since the custom loss function is equivalent to the standard 'mean_squared_error' loss, both networks should have the same training results (I also reset the random seeds to make sure that they have the same initialization).
However, the training results are very different. It seems that the concatenation makes the training process different? Any ideas?
Thank you again for all your help!
Final update: Ori's approach of concatenating input and output layers works, and is verified by using the generator. Thanks!!
The problem is that when compiling the model, you set x_true to be a static tensor, in the size of all the samples. While the input for keras loss functions are the y_true and y_pred, where each of them is of size [batch_size, :].
As I see it there are 2 options you can solve this, the first one is using a generator for creating the batches, in such a way that you will have control over which indices are evaluated each time, and at the loss function you could slice the x_true tensor to fit the samples being evaluated:
def custom_loss(x_true)
def loss(y_true, y_pred):
x_true_samples = relevant_samples(x_true)
return K.mean(K.square(y_pred - y_true) + K.square(y_true - x_true_samples))
return loss
This solution can be complicated, what I would suggest is a simpler workaround -
Concatenate the input layer with the output layer, such that your new output will be of the form original_output , input.
Now you can use a new modified loss function:
def loss(y_true, y_pred):
return K.mean(K.square(y_pred[:,:output_shape] - y_true[:,:output_shape]) +
K.square(y_true[:,:output_shape] - y_pred[:,outputshape:))
Now your new loss function will take in account both the input data, and the prediction.
Edit:
Note that while you set the seed, your models are not exactly the same, and as you did not use a generator, you let keras choose the batches, and for different models he might pick different samples.
As your model does not converge, different samples can lead to different results.
I added a generator to your code, to verify the samples we pick for training, now you can see both results are the same:
def custom_loss(y_true, y_pred): # this is essentially the mean_square_error
mse = keras.losses.mean_squared_error(y_true, y_pred[:,2])
return mse
def generator(x, y, batch_size):
curIndex = 0
batch_x = np.zeros((batch_size,2))
batch_y = np.zeros((batch_size,1))
while True:
for i in range(batch_size):
batch_x[i] = x[curIndex,:]
batch_y[i] = y[curIndex,:]
i += 1;
if i == 5000:
i = 0
yield batch_x, batch_y
# set the seeds so that we get the same initialization across different trials
seed_numpy = 0
seed_tensorflow = 0
# generate data of x = [ y^3 y^2 ]
y = np.random.rand(5000+1000,1) * 2 # generate 5000 training and 1000 testing samples
x = np.concatenate( ( np.power(y, 3) , np.power(y, 2) ) , axis=1 )
training_data = x[0:5000:1,:]
training_label = y[0:5000:1]
testing_data = x[5000:6000:1,:]
testing_label = y[5000:6000:1]
batch_size = 32
# build the standard neural network with one hidden layer
seed(seed_numpy)
set_random_seed(seed_tensorflow)
input_standard = Input(shape=(2,)) # input
hidden_standard = Dense(10, activation='relu', input_shape=(2,))(input_standard) # hidden layer
output_standard = Dense(1, activation='linear')(hidden_standard) # output layer
model_standard = Model(inputs=[input_standard], outputs=[output_standard]) # build the model
model_standard.compile(loss='mse', optimizer='adam') # compile the model
#model_standard.fit(training_data, training_label, epochs=50, batch_size = 10) # train the model
model_standard.fit_generator(generator(training_data,training_label,batch_size), steps_per_epoch= 32, epochs= 100)
testing_label_pred_standard = model_standard.predict(testing_data) # make prediction
# get the mean squared error
mse_standard = np.sum( np.power( testing_label_pred_standard - testing_label , 2 ) ) / 1000
# build the neural network with the custom loss
seed(seed_numpy)
set_random_seed(seed_tensorflow)
input_custom = Input(shape=(2,)) # input
hidden_custom = Dense(10, activation='relu', input_shape=(2,))(input_custom) # hidden layer
output_custom_temp = Dense(1, activation='linear')(hidden_custom) # output layer
output_custom = keras.layers.concatenate([input_custom, output_custom_temp])
model_custom = Model(inputs=input_custom, outputs=output_custom) # build the model
model_custom.compile(loss = custom_loss, optimizer='adam') # compile the model
#model_custom.fit(training_data, training_label, epochs=50, batch_size = 10) # train the model
model_custom.fit_generator(generator(training_data,training_label,batch_size), steps_per_epoch= 32, epochs= 100)
testing_label_pred_custom = model_custom.predict(testing_data)
# get the mean squared error
mse_custom = np.sum( np.power( testing_label_pred_custom[:,2:3:1] - testing_label , 2 ) ) / 1000
# compare the result
print( [ mse_standard , mse_custom ] )
I have built a LSTM model using Keras library to predict duplicate questions on the Quora official dataset. The test labels are 0 or 1. 1 indicates the question pair is duplicate. After building the model using model.fit, I test the model using model.predict on the test data. The output is an array of values(probabilities) like below:
[ 0.00514298]
[ 0.15161049]
[ 0.27588326]
[ 0.00236167]
[ 1.80067325]
[ 0.01048524]
[ 1.43425131]
[ 1.99202418]
[ 0.54853892]
[ 0.02514757]
I am only showing the first 10 values in the array. I don't understand what do these values mean and how do I compare it against the test labels to calculate the test accuracy. I want the model to output the binary predicted values as 0 or 1 rather than the probabilities. Please refer the last section of my code below:
sequence_1_input = Input(shape=(MAX_SEQUENCE_LENGTH,), dtype='int32')
embedded_sequences_1 = embedding_layer(sequence_1_input)
x1 = lstm_layer(embedded_sequences_1)
sequence_2_input = Input(shape=(MAX_SEQUENCE_LENGTH,), dtype='int32')
embedded_sequences_2 = embedding_layer(sequence_2_input)
y1 = lstm_layer(embedded_sequences_2)
merged = concatenate([x1, y1])
merged = Dropout(rate_drop_dense)(merged)
merged = BatchNormalization()(merged)
merged = Dense(num_dense, activation=act)(merged)
merged = Dropout(rate_drop_dense)(merged)
merged = BatchNormalization()(merged)
preds = Dense(1, activation='sigmoid')(merged)
########################################
## train the model
########################################
model = Model(inputs=[sequence_1_input, sequence_2_input], \
outputs=preds)
model.compile(loss='binary_crossentropy',
optimizer='nadam',
metrics=['acc'])
hist = model.fit([data_1_train, data_2_train], labels_train, \
validation_data=([data_1_val, data_2_val], labels_val, weight_val), \
epochs=200, batch_size=2048, shuffle=True, \
class_weight=class_weight, callbacks=[early_stopping, model_checkpoint])
preds = model.predict([test_data_1, test_data_2], batch_size=8192,
verbose=1)
preds += model.predict([test_data_2, test_data_1], batch_size=8192,
verbose=1)
preds /= 2
print(type(preds))
print(preds[:20])
print('preds.ravel')
print(preds.ravel())
As you say, your output is a np array with probabilities. You can convert it to binary labels by doing for example (model.predict(X) > 0.5).astype(int)
Artificial neural networks are probablisitc classfiiers, so your output is absolutly fine. It´s just the probability to belong to your target label.
In addition one interesting fact is that 0.5 is maybe not the offet you want to use. It depends on, how important true-positives and false-positives are in your task. You can take a look at the ROC Curves to find the optimal offset.
You can try changing your activation function to softmax in your last layer or you can make your own softmax function and pass your output to that function. Here's an example for a custom softmax function
def softmax(x):
return np.exp(x) / np.sum(np.exp(x), axis=0)
Any advice is welcome as this is an ambitious second coding project. :)
Specifically, I'm having two different issues with this DNN.
I can only seem to get it to run 1 of 100 evaluation steps and,
Trouble getting meaningful predictions.
At some point it was running all 100 steps of evaluation. I cannot seem to replicate that now for anything. What am I missing?
The data set is for a dice game. The predictions I'm looking for would be in an array of the same shape as the features and labels with a binary prediction for each position in the array.
I have tried different array shapes and depths to the point that I'm all turned around. Perhaps a different estimator is the solution? It throws a features dictionary '1' not found if I try to feed one feature/label combination to the predictor; it demands the same set size as the training and test sets.
Is there a way to return predictions in this way?
Example:
predict_feature = {'0': [1, 2, 5, 1, 4, 3]} #1's and 5's would be 'keepers'
predict_label = np.array([1, 0, 1, 1, 0, 0])
desired output = np.array[.91, .12, .89, .92, .06, .15]
The features are generated randomly and labels are created via scoring algorithm from the game. They are passed through the below to create the features dictionary and put labels into an array. Similar functions create the evaluation and prediction sets.
def train_evaluation_set(features, labels):
"""Creates training input set"""
feature = {}
features = [[digit for digit in features[x]] for x in range(len(features))]
for x in range(len(features)):
feature.update({"{}".format(x): features[x]})
label = np.array(labels)
return feature, label
Tensors are then created.
def train_input_fn(feature, label, batch_size):
"""Input function for training"""
dataset = tf.data.Dataset.from_tensor_slices((dict(feature), label))
dataset = dataset.shuffle(shuffle_x).repeat().batch(100)
iterator = dataset.make_one_shot_iterator()
feature, label = iterator.get_next()
return feature, label
The estimator is set up thusly:
def main(main=None, argv=None):
# Set feature columns.
my_feature_columns = []
for key in feature.keys():
my_feature_columns.append(tf.feature_column.numeric_column(key=key))
# Instantiate estimator.
classifier = tf.estimator.DNNClassifier(
feature_columns=my_feature_columns,
hidden_units=[100, 100, 100],
n_classes=2)
# Train the Model.
classifier.train(
input_fn=lambda: train_input_fn(feature, label, batch_size),
steps=train_steps)
# Evaluate the model.
eval_result = classifier.evaluate(
input_fn=lambda: eval_input_fn(test_feature, test_label, batch_size),
steps=200)
print('\nTest set accuracy: {accuracy:0.3f}\n'.format(**eval_result))
# Generate predictions from the model
predictions = classifier.predict(
input_fn=lambda: predict_input_fn(predict_feature, predict_label[0]))
pp.pprint(next(predictions))
From here the training runs smoothly and one evaluation step is completed.
INFO:tensorflow:Loss for final step: 0.00292182.
WARNING:tensorflow:Casting <dtype: 'float32'> labels to bool.
WARNING:tensorflow:Casting <dtype: 'float32'> labels to bool.
INFO:tensorflow:Starting evaluation at 2018-02-20-09:06:14
INFO:tensorflow:Restoring parameters from C:\Users\Paul\AppData\Local\Temp\tmp97u0tbvx\model.ckpt-1000
INFO:tensorflow:Evaluation [1/200]
INFO:tensorflow:Finished evaluation at 2018-02-20-09:06:19
INFO:tensorflow:Saving dict for global step 1000: accuracy = 0.666667, accuracy_baseline = 0.833333, auc = 0.8, auc_precision_recall = 0.25, average_loss = 0.623973, global_step = 1000, label/mean = 0.166667, loss = 3.74384, prediction/mean = 0.216801
Test set accuracy: 0.667
I have a suspicion that the WARNING steps are where my problem with the prediction lies even though the labels have already been booled, but have no clue what to do about it.
And, finally, pretty print gives me:
{'class_ids': array([1], dtype=int64),
'classes': array([b'1'], dtype=object),
'logistic': array([ 0.70525986], dtype=float32),
'logits': array([ 0.87247205], dtype=float32),
'probabilities': array([ 0.2947402 , 0.70525986], dtype=float32)}
Full code can be found at https://github.com/llpk79/DNNTenThousand