Tensorflow DNNClassifier predictions as array - python

Any advice is welcome as this is an ambitious second coding project. :)
Specifically, I'm having two different issues with this DNN.
I can only seem to get it to run 1 of 100 evaluation steps and,
Trouble getting meaningful predictions.
At some point it was running all 100 steps of evaluation. I cannot seem to replicate that now for anything. What am I missing?
The data set is for a dice game. The predictions I'm looking for would be in an array of the same shape as the features and labels with a binary prediction for each position in the array.
I have tried different array shapes and depths to the point that I'm all turned around. Perhaps a different estimator is the solution? It throws a features dictionary '1' not found if I try to feed one feature/label combination to the predictor; it demands the same set size as the training and test sets.
Is there a way to return predictions in this way?
Example:
predict_feature = {'0': [1, 2, 5, 1, 4, 3]} #1's and 5's would be 'keepers'
predict_label = np.array([1, 0, 1, 1, 0, 0])
desired output = np.array[.91, .12, .89, .92, .06, .15]
The features are generated randomly and labels are created via scoring algorithm from the game. They are passed through the below to create the features dictionary and put labels into an array. Similar functions create the evaluation and prediction sets.
def train_evaluation_set(features, labels):
"""Creates training input set"""
feature = {}
features = [[digit for digit in features[x]] for x in range(len(features))]
for x in range(len(features)):
feature.update({"{}".format(x): features[x]})
label = np.array(labels)
return feature, label
Tensors are then created.
def train_input_fn(feature, label, batch_size):
"""Input function for training"""
dataset = tf.data.Dataset.from_tensor_slices((dict(feature), label))
dataset = dataset.shuffle(shuffle_x).repeat().batch(100)
iterator = dataset.make_one_shot_iterator()
feature, label = iterator.get_next()
return feature, label
The estimator is set up thusly:
def main(main=None, argv=None):
# Set feature columns.
my_feature_columns = []
for key in feature.keys():
my_feature_columns.append(tf.feature_column.numeric_column(key=key))
# Instantiate estimator.
classifier = tf.estimator.DNNClassifier(
feature_columns=my_feature_columns,
hidden_units=[100, 100, 100],
n_classes=2)
# Train the Model.
classifier.train(
input_fn=lambda: train_input_fn(feature, label, batch_size),
steps=train_steps)
# Evaluate the model.
eval_result = classifier.evaluate(
input_fn=lambda: eval_input_fn(test_feature, test_label, batch_size),
steps=200)
print('\nTest set accuracy: {accuracy:0.3f}\n'.format(**eval_result))
# Generate predictions from the model
predictions = classifier.predict(
input_fn=lambda: predict_input_fn(predict_feature, predict_label[0]))
pp.pprint(next(predictions))
From here the training runs smoothly and one evaluation step is completed.
INFO:tensorflow:Loss for final step: 0.00292182.
WARNING:tensorflow:Casting <dtype: 'float32'> labels to bool.
WARNING:tensorflow:Casting <dtype: 'float32'> labels to bool.
INFO:tensorflow:Starting evaluation at 2018-02-20-09:06:14
INFO:tensorflow:Restoring parameters from C:\Users\Paul\AppData\Local\Temp\tmp97u0tbvx\model.ckpt-1000
INFO:tensorflow:Evaluation [1/200]
INFO:tensorflow:Finished evaluation at 2018-02-20-09:06:19
INFO:tensorflow:Saving dict for global step 1000: accuracy = 0.666667, accuracy_baseline = 0.833333, auc = 0.8, auc_precision_recall = 0.25, average_loss = 0.623973, global_step = 1000, label/mean = 0.166667, loss = 3.74384, prediction/mean = 0.216801
Test set accuracy: 0.667
I have a suspicion that the WARNING steps are where my problem with the prediction lies even though the labels have already been booled, but have no clue what to do about it.
And, finally, pretty print gives me:
{'class_ids': array([1], dtype=int64),
'classes': array([b'1'], dtype=object),
'logistic': array([ 0.70525986], dtype=float32),
'logits': array([ 0.87247205], dtype=float32),
'probabilities': array([ 0.2947402 , 0.70525986], dtype=float32)}
Full code can be found at https://github.com/llpk79/DNNTenThousand

Related

How to visualize training process with output per patch/epoch?

My neural network in Keras learns a representation of my original data. In order to see exactly how it learns I thought it would be interesting to plot the data for every training batch (or epoch alternatively) and convert the plots into a video.
I'm stuck on how to get the outputs of my model during the training phase.
I thought about doing something like this (pseudo code):
epochs = 200
plt_outputs = []
for i in range(epochs):
model.fit(x_train,y_train, epochs = 1)
plt_outputs.append(output_layer(x_test))
where output_layer is the layer in my neural network I'm interested in. Afterwards I would use plot_data to generate each plot and turn it into a video. (That part I'm not concerned about yet..)
But that doesn't strike me as a good solution, plus I don't know how get the output for every batch. Any thoughts on this?
You can customize what happens in the test step, much like this official tutorial:
import tensorflow as tf
import numpy as np
class CustomModel(tf.keras.Model):
def test_step(self, data):
# Unpack the data
x, y = data
# Compute predictions
y_pred = self(x, training=False)
test_outputs.append(y_pred) # ADD THIS HERE
# Updates the metrics tracking the loss
self.compiled_loss(y, y_pred, regularization_losses=self.losses)
# Update the metrics.
self.compiled_metrics.update_state(y, y_pred)
# Return a dict mapping metric names to current value.
# Note that it will include the loss (tracked in self.metrics).
return {m.name: m.result() for m in self.metrics}
# Construct an instance of CustomModel
inputs = tf.keras.Input(shape=(8,))
x = tf.keras.layers.Dense(8, activation='relu')(inputs)
outputs = tf.keras.layers.Dense(1)(x)
model = CustomModel(inputs, outputs)
model.compile(loss="mse", metrics=["mae"], run_eagerly=True)
test_outputs = list() # ADD THIS HERE
# Evaluate with our custom test_step
x = np.random.random((1000, 8))
y = np.random.random((1000, 1))
model.evaluate(x, y)
I added a list, and now in the test step, it will append this list with the output. You will need to add run_eagerly=True in model.compile() for this to work. This will output a list of such outputs:
<tf.Tensor: shape=(32, 1), dtype=float32, numpy=
array([[ 0.10866462],
[ 0.2749035 ],
[ 0.08196291],
[ 0.25862294],
[ 0.30985728],
[ 0.20230596],
...
[ 0.17108777],
[ 0.29692617],
[-0.03684975],
[ 0.03525433],
[ 0.26774448],
[ 0.21728781],
[ 0.0840873 ]], dtype=float32)>

How to correctly use the Tensorflow MeanIOU metric?

I want to use the MeanIoU metric in keras (doc link). But I don't really understand how it could be integrated with the keras api. In the example, the prediction and the ground truth are given as binary values but with keras we should get probabilities, especially because the loss is mse...
We should have something like:
m = tf.keras.metrics.MeanIoU(num_classes=2)
m.update_state([0, 0, 1, 1], [0.3, 0.6, 0.2, 0.9])
But now the result isn't the same, we have:
# <tf.Variable 'UnreadVariable' shape=(2, 2) dtype=float64, numpy=array([[2., 0.],
# [2., 0.]])>
m.result().numpy() # 0.25
So my question is how should we use this metric if the output of the model is probabilities? binary or even in a multi-class setting (one hot)?
For the Accuracy there is a distinction between BinaryAccuracy and CategoricalAccuracy and they both take probabilities in y_pred. Shouldn't it be the same for MeanIoU?
I am having similar issues. Despite looking for examples online, all demonstrations happens after applying argmax on the model's output.
The workaround I have for now is to subclass tf.keras.metrics.MeanIoU:
class MyMeanIOU(tf.keras.metrics.MeanIoU):
def update_state(self, y_true, y_pred, sample_weight=None):
return super().update_state(tf.argmax(y_true, axis=-1), tf.argmax(y_pred, axis=-1), sample_weight)
It is also possible to create your own function, but it is recommended to subclass tf.keras.metrics.Metric if you wish to benefit from the extra features such as distributed strategies.
I am still looking for cleaner solutions.
i have the same problem, and i look into the source code.
In tf2.0, at the end of the update_state function, there is :
current_cm = confusion_matrix.confusion_matrix(
y_true,
y_pred,
self.num_classes,
weights=sample_weight,
dtype=dtypes.float64)
and i look into confusion_matrix function,
with ops.name_scope(name, 'confusion_matrix',
(predictions, labels, num_classes, weights)) as name:
labels, predictions = remove_squeezable_dimensions(
ops.convert_to_tensor(labels, name='labels'),
ops.convert_to_tensor(
predictions, name='predictions'))
predictions = math_ops.cast(predictions, dtypes.int64)
labels = math_ops.cast(labels, dtypes.int64)
# Sanity checks - underflow or overflow can cause memory corruption.
labels = control_flow_ops.with_dependencies(
[check_ops.assert_non_negative(
labels, message='`labels` contains negative values')],
labels)
predictions = control_flow_ops.with_dependencies(
[check_ops.assert_non_negative(
predictions, message='`predictions` contains negative values')],
predictions)
if num_classes is None:
num_classes = math_ops.maximum(math_ops.reduce_max(predictions),
math_ops.reduce_max(labels)) + 1
else:
num_classes_int64 = math_ops.cast(num_classes, dtypes.int64)
labels = control_flow_ops.with_dependencies(
[check_ops.assert_less(
labels, num_classes_int64, message='`labels` out of bound')],
labels)
predictions = control_flow_ops.with_dependencies(
[check_ops.assert_less(
predictions, num_classes_int64,
message='`predictions` out of bound')],
predictions)
if weights is not None:
weights = ops.convert_to_tensor(weights, name='weights')
predictions.get_shape().assert_is_compatible_with(weights.get_shape())
weights = math_ops.cast(weights, dtype)
shape = array_ops.stack([num_classes, num_classes])
indices = array_ops.stack([labels, predictions], axis=1)
values = (array_ops.ones_like(predictions, dtype)
if weights is None else weights)
cm_sparse = sparse_tensor.SparseTensor(
indices=indices,
values=values,
dense_shape=math_ops.cast(shape, dtypes.int64))
zero_matrix = array_ops.zeros(math_ops.cast(shape, dtypes.int32), dtype)
return sparse_ops.sparse_add(zero_matrix, cm_sparse)
the trick is at 6th line of the code, tf use math_ops.cast cast the predictions to int64, so when you send [0.3, 0.6, 0.2, 0.9] into cast function, it returns [0, 0, 0, 0].
So, that's why you got a confusion maxtrix
[[2., 0.],
[2., 0.]]

How do I get the predicted labels from a model.predict function from Keras?

I have built a LSTM model using Keras library to predict duplicate questions on the Quora official dataset. The test labels are 0 or 1. 1 indicates the question pair is duplicate. After building the model using model.fit, I test the model using model.predict on the test data. The output is an array of values(probabilities) like below:
[ 0.00514298]
[ 0.15161049]
[ 0.27588326]
[ 0.00236167]
[ 1.80067325]
[ 0.01048524]
[ 1.43425131]
[ 1.99202418]
[ 0.54853892]
[ 0.02514757]
I am only showing the first 10 values in the array. I don't understand what do these values mean and how do I compare it against the test labels to calculate the test accuracy. I want the model to output the binary predicted values as 0 or 1 rather than the probabilities. Please refer the last section of my code below:
sequence_1_input = Input(shape=(MAX_SEQUENCE_LENGTH,), dtype='int32')
embedded_sequences_1 = embedding_layer(sequence_1_input)
x1 = lstm_layer(embedded_sequences_1)
sequence_2_input = Input(shape=(MAX_SEQUENCE_LENGTH,), dtype='int32')
embedded_sequences_2 = embedding_layer(sequence_2_input)
y1 = lstm_layer(embedded_sequences_2)
merged = concatenate([x1, y1])
merged = Dropout(rate_drop_dense)(merged)
merged = BatchNormalization()(merged)
merged = Dense(num_dense, activation=act)(merged)
merged = Dropout(rate_drop_dense)(merged)
merged = BatchNormalization()(merged)
preds = Dense(1, activation='sigmoid')(merged)
########################################
## train the model
########################################
model = Model(inputs=[sequence_1_input, sequence_2_input], \
outputs=preds)
model.compile(loss='binary_crossentropy',
optimizer='nadam',
metrics=['acc'])
hist = model.fit([data_1_train, data_2_train], labels_train, \
validation_data=([data_1_val, data_2_val], labels_val, weight_val), \
epochs=200, batch_size=2048, shuffle=True, \
class_weight=class_weight, callbacks=[early_stopping, model_checkpoint])
preds = model.predict([test_data_1, test_data_2], batch_size=8192,
verbose=1)
preds += model.predict([test_data_2, test_data_1], batch_size=8192,
verbose=1)
preds /= 2
print(type(preds))
print(preds[:20])
print('preds.ravel')
print(preds.ravel())
As you say, your output is a np array with probabilities. You can convert it to binary labels by doing for example (model.predict(X) > 0.5).astype(int)
Artificial neural networks are probablisitc classfiiers, so your output is absolutly fine. It´s just the probability to belong to your target label.
In addition one interesting fact is that 0.5 is maybe not the offet you want to use. It depends on, how important true-positives and false-positives are in your task. You can take a look at the ROC Curves to find the optimal offset.
You can try changing your activation function to softmax in your last layer or you can make your own softmax function and pass your output to that function. Here's an example for a custom softmax function
def softmax(x):
return np.exp(x) / np.sum(np.exp(x), axis=0)

Do I need to use one_hot encoding if my output variable is binary?

I am developing a Tensorflow network based on their MNIST for beginners template. Basically, I am trying to implement a simple logistic regression in which 10 continuous variables predict a binary outcome, so my inputs are 10 values between 0 and 1, and my target variable (Y_train and Y_test in the code) is a 1 or 0.
My main problem is that there is no change in accuracy no matter how many training sets I run -- it is 0.276667 whether I run 100 or 31240 steps. Additionally, when I switch from the softmax to simply matmul to generate my Y values, I get 0.0 accuracy, which suggests there may be something wrong with my x*W + b calculation. The inputs read out just fine.
What I'm wondering is a) whether I'm not calculating Y values properly because of an error in my code and b) if that's not the case, is it possible that I need to implement the one_hot vectors -- even though my output already takes the form of 0 or 1. If the latter is the case, where do I include the one_hot=TRUE function in my generation of the target values vector? Thanks!
import numpy as np
import tensorflow as tf
train_data = np.genfromtxt("TRAINDATA2.txt", delimiter=" ")
train_input = train_data[:, :10]
train_input = train_input.reshape(31240, 10)
X_train = tf.placeholder(tf.float32, [31240, 10])
train_target = train_data[:, 10]
train_target = train_target.reshape(31240, 1)
Y_train = tf.placeholder(tf.float32, [31240, 1])
test_data = np.genfromtxt("TESTDATA2.txt", delimiter = " ")
test_input = test_data[:, :10]
test_input = test_input.reshape(7800, 10)
X_test = tf.placeholder(tf.float32, [7800, 10])
test_target = test_data[:, 10]
test_target = test_target.reshape(7800, 1)
Y_test = tf.placeholder(tf.float32, [7800, 1])
W = tf.Variable(tf.zeros([10, 1]))
b = tf.Variable(tf.zeros([1]))
Y_obt = tf.nn.softmax(tf.matmul(X_train, W) + b)
Y_obt_test = tf.nn.softmax(tf.matmul(X_test, W) + b)
cross_entropy = tf.nn.softmax_cross_entropy_with_logits(logits=Y_obt,
labels=Y_train)
train_step = tf.train.GradientDescentOptimizer(0.05).minimize(cross_entropy)
sess = tf.InteractiveSession()
tf.global_variables_initializer().run()
for _ in range(31240):
sess.run(train_step, feed_dict={X_train: train_input,
Y_train:train_target})
correct_prediction = tf.equal(tf.round(Y_obt_test), Y_test)
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
print(sess.run(accuracy, feed_dict={X_test : test_input, Y_test:
test_target}))
Since you map your values to a target with one element, you should not use softmax cross entropy, since the softmax operation transforms the input into a probability distribution, with the sum of all probabilities equal to 1. Since your target has only one element, it will simply output 1 everytime, since this is the only possible way to transform the input into a probability distribution.
You should instead use tf.nn.sigmoid_cross_entropy_with_logits() (which is used for binary classification) and also remove the softmax from Y_obt and convert it into tf.sigmoid() for Y_obt_test.
Another way is to one-hot encode your targets and use a network with a two-element output. In this case, you should use tf.nn.softmax_cross_entropy_with_logits(), but remove the tf.nn.softmax() from Y_obt, since the softmax cross entropy expects unscaled logits (https://www.tensorflow.org/api_docs/python/tf/nn/softmax_cross_entropy_with_logits). For the Y_obt_test, you should of course not remove it in this case.
Another thing: It might also help to take the mean of the cross entropies with cross_entropy = tf.reduce_mean(tf.sigmoid_cross_entropy_...).

LSTM Autoencoder

I'm trying to build a LSTM autoencoder with the goal of getting a fixed sized vector from a sequence, which represents the sequence as good as possible. This autoencoder consists of two parts:
LSTM Encoder: Takes a sequence and returns an output vector (return_sequences = False)
LSTM Decoder: Takes an output vector and returns a sequence (return_sequences = True)
So, in the end, the encoder is a many to one LSTM and the decoder is a one to many LSTM.
Image source: Andrej Karpathy
On a high level the coding looks like this (similar as described here):
encoder = Model(...)
decoder = Model(...)
autoencoder = Model(encoder.inputs, decoder(encoder(encoder.inputs)))
autoencoder.compile(loss='binary_crossentropy',
optimizer='adam',
metrics=['accuracy'])
autoencoder.fit(data, data,
batch_size=100,
epochs=1500)
The shape (number of training examples, sequence length, input dimension) of the data array is (1200, 10, 5) and looks like this:
array([[[1, 0, 0, 0, 0],
[0, 1, 0, 0, 0],
[0, 0, 1, 0, 0],
...,
[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0]],
... ]
Problem: I am not sure how to proceed, especially how to integrate LSTM to Model and how to get the decoder to generate a sequence from a vector.
I am using keras with tensorflow backend.
EDIT: If someone wants to try out, here is my procedure to generate random sequences with moving ones (including padding):
import random
import math
def getNotSoRandomList(x):
rlen = 8
rlist = [0 for x in range(rlen)]
if x <= 7:
rlist[x] = 1
return rlist
sequence = [[getNotSoRandomList(x) for x in range(round(random.uniform(0, 10)))] for y in range(5000)]
### Padding afterwards
from keras.preprocessing import sequence as seq
data = seq.pad_sequences(
sequences = sequence,
padding='post',
maxlen=None,
truncating='post',
value=0.
)
Models can be any way you want. If I understood it right, you just want to know how to create models with LSTM?
Using LSTMs
Well, first, you have to define what your encoded vector looks like. Suppose you want it to be an array of 20 elements, a 1-dimension vector. So, shape (None,20). The size of it is up to you, and there is no clear rule to know the ideal one.
And your input must be three-dimensional, such as your (1200,10,5). In keras summaries and error messages, it will be shown as (None,10,5), as "None" represents the batch size, which can vary each time you train/predict.
There are many ways to do this, but, suppose you want only one LSTM layer:
from keras.layers import *
from keras.models import Model
inpE = Input((10,5)) #here, you don't define the batch size
outE = LSTM(units = 20, return_sequences=False, ...optional parameters...)(inpE)
This is enough for a very very simple encoder resulting in an array with 20 elements (but you can stack more layers if you want). Let's create the model:
encoder = Model(inpE,outE)
Now, for the decoder, it gets obscure. You don't have an actual sequence anymore, but a static meaningful vector. You may want to use LTSMs still, they will suppose the vector is a sequence.
But here, since the input has shape (None,20), you must first reshape it to some 3-dimensional array in order to attach an LSTM layer next.
The way you will reshape it is entirely up to you. 20 steps of 1 element? 1 step of 20 elements? 10 steps of 2 elements? Who knows?
inpD = Input((20,))
outD = Reshape((10,2))(inpD) #supposing 10 steps of 2 elements
It's important to notice that if you don't have 10 steps anymore, you won't be able to just enable "return_sequences" and have the output you want. You'll have to work a little. Acually, it's not necessary to use "return_sequences" or even to use LSTMs, but you may do that.
Since in my reshape I have 10 timesteps (intentionally), it will be ok to use "return_sequences", because the result will have 10 timesteps (as the initial input)
outD1 = LSTM(5,return_sequences=True,...optional parameters...)(outD)
#5 cells because we want a (None,10,5) vector.
You could work in many other ways, such as simply creating a 50 cell LSTM without returning sequences and then reshaping the result:
alternativeOut = LSTM(50,return_sequences=False,...)(outD)
alternativeOut = Reshape((10,5))(alternativeOut)
And our model goes:
decoder = Model(inpD,outD1)
alternativeDecoder = Model(inpD,alternativeOut)
After that, you unite the models with your code and train the autoencoder.
All three models will have the same weights, so you can make the encoder bring results just by using its predict method.
encoderPredictions = encoder.predict(data)
What I often see about LSTMs for generating sequences is something like predicting the next element.
You take just a few elements of the sequence and try to find the next element. And you take another segment one step forward and so on. This may be helpful in generating sequences.
You can find a simple of sequence to sequence autoencoder here: https://blog.keras.io/building-autoencoders-in-keras.html
Here is an example
Let's create a synthetic data consisting of a few sequence. The idea is looking into these sequences through the lens of an autoencoder. In other words, lowering the dimension or summarizing them into a fixed length.
# define input sequence
sequence = np.array([[0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9],
[0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8],
[0.2, 0.4, 0.6, 0.8],
[0.3, 0.6, 0.9, 1.2]])
# prepare to normalize
x = pd.DataFrame(sequence.tolist()).T.values
scaler = preprocessing.StandardScaler()
x_scaled = scaler.fit_transform(x)
sequence_normalized = [col[~np.isnan(col)] for col in x_scaled.T]
# make sure to use dtype='float32' in padding otherwise with floating points
sequence = pad_sequences(sequence, padding='post', dtype='float32')
# reshape input into [samples, timesteps, features]
n_obs = len(sequence)
n_in = 9
sequence = sequence.reshape((n_obs, n_in, 1))
Let's device a simple LSTM
#define encoder
visible = Input(shape=(n_in, 1))
encoder = LSTM(2, activation='relu')(visible)
# define reconstruct decoder
decoder1 = RepeatVector(n_in)(encoder)
decoder1 = LSTM(100, activation='relu', return_sequences=True)(decoder1)
decoder1 = TimeDistributed(Dense(1))(decoder1)
# tie it together
myModel = Model(inputs=visible, outputs=decoder1)
# summarize layers
print(myModel.summary())
#sequence = tmp
myModel.compile(optimizer='adam', loss='mse')
history = myModel.fit(sequence, sequence,
epochs=400,
verbose=0,
validation_split=0.1,
shuffle=True)
plot_model(myModel, show_shapes=True, to_file='reconstruct_lstm_autoencoder.png')
# demonstrate recreation
yhat = myModel.predict(sequence, verbose=0)
# yhat
import matplotlib.pyplot as plt
#plot our loss
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('model train vs validation loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'validation'], loc='upper right')
plt.show()
Lets build the autoencoder
# use our encoded layer to encode the training input
decoder_layer = myModel.layers[1]
encoded_input = Input(shape=(9, 1))
decoder = Model(encoded_input, decoder_layer(encoded_input))
# we are interested in seeing how the encoded sequences with lenght 2 (same as the dimension of the encoder looks like)
out = decoder.predict(sequence)
f = plt.figure()
myx = out[:,0]
myy = out[:,1]
s = plt.scatter(myx, myy)
for i, txt in enumerate(out[:,0]):
plt.annotate(i+1, (myx[i], myy[i]))
And here is the representation of the sequences

Categories

Resources