Perceptron outputs the same value for all inputs - python

I am a newbie to neural networks. I coded a perceptron in Python 3.10 without using libraries. But I am facing an issue. It always returns either True for all the input data or False for all the input data. I am not sure why this happens.
Details about the project:
The learning rate of the perceptron is set to 0.1.
It is trained on a bunch of randomly trained points(100 randomly generated points).
It's purpose is to figure out whether the x - coordinate of the point is greater than the y - coordinate if the point.
It uses the "sign" activation function.
# activation function, return 1 if positive else -1
def sig(self, n):
return 1 if n == abs(n) else -1
The training process looks like this,
# training the perceptron with the training data
for point in train_data:
neuron.train(point.inputs, point.label)
The training method is defined as,
# training the perceptron with known data
def train(self, train_data, target):
prediction = self.predict(train_data)
error = prediction - target
# nudging the weights by calculating delta weight
for i in range(len(self.weights)):
delta_weight = error * train_data[i] * self.learning_rate
self.weights[i] += delta_weight
Link to github repo: https://github.com/cipherDOT/perceptron
Please help me solve this issue.

Related

How can I correctly implement backpropagation using categorical cross-entropy as my loss function in just Numpy and Pandas?

Long story short: How do I fix my backpropagation code, so that weights and bias are being changed effecively by my evaluate() function to present predictions closer to the target values rather than odds no better than guessing?
Details below:
I've currently got the backbone of this neural network from scratch which I'm creating using techniques gleaned from Sentdex's Neural Networks From Scratch series on YouTube and a Towards Data Science article for the backpropagation part specifically. It works by creating a large class called Neural Network which would have several LayerDense objects associated by composition, which would act as each layer within the neural network.
As my inputs to the neural network, I pass in a batch of 8 records from a Pandas DataFrame, each containing 100 values of 0 or 1, depending on their prefered options. As target values, I pass in another DataFrame containing the actual genders of each participant, with 0 being male and 1 being female.
These LayerDense objects would deal with the forward and backward passes of each layer. Prior to implementing the softmax function and backpropagation, this all worked as expected.
My current issue is getting the evaluate() function within the program to run as expected & getting the run() function to handle this information correctly.
In theory, the evaluate function should return the loss of each neuron and the run function should handle this and run the backward pass through each neuron, adjusting its weights & biases appropiately.
What actually happens is that my final outputs of classification, which are the confidence levels in predictions, with values closer to 0 representing a male gender prediction and values closer to 1 representing a female gender prediction.
Using categorical cross-entropy as my loss function, how would I properly implement backpropagation in this situation? What may I be doing wrong here?
All resource links used to get this far and the whole source code will be linked below.
Current evaluation code
def evaluate(self):
#Target values are the y values that we want to be predicting correctly
#You can calculate the loss of a categorical neural network (basically most NN) by using
#categorical cross-entropy
#Using one-hot encoding to calculate the categorical cross-entropy of data (loss)
#In one-hot encoding, we assign the target class position we want in our array of outputs
#Then make an array of 0s of the same length as outputs but put a 1 in the target class position
#This basically simplifies to just the negative natural logarithm of the predicted target value
#The following code will represent the confidence values in the predictions made by the NN
#For this to work, if categorical, the number of outputs must equal the number of possible class targets
#E.g for gender, there's two possible class targets (0 and 1), so two output neurons
#The string can be changed to the attribute in the table that you shall be predicting
#A short but ugly way of getting a start to complete this task
'''
loss = -np.log(self._network[-1].output[range(len(self._network[-1].output)),target_values.loc[:,"gender"]])
average_loss = np.mean(loss)
'''
#A nicer way to accomplish the same thing
samples = len(self._network[-1].output)
#Clip the values so we don't get any infinity errors if a confidence level happens to be spot on
y_pred_clipped = np.clip(self._network[-1].output, 1e-7, 1-1e-7)
#If one-hot encoding has not been passed in
if len(self._target_values.shape) == 1:
#Selecting the largest confidences based on their position
correct_confidences = y_pred_clipped[range(samples),self._target_values[:samples]]
elif len(self._target_values.shape) == 2:
#One-hot encoding has been used in this scenario
correct_confidences = np.sum(y_pred_clipped*self._target_values[:samples], axis=1)
#Calculate the loss and return
loss = -np.log(correct_confidences)
return loss, correct_confidences
Current run() code
def run(self, **kwargs):
epochs = kwargs['epochs']
#Start by putting initial inputs into the input layer and generating the network
self._network[0].forward(self._inputs)
for i in range(len(self._network)-1):
#Using the previous layer's outputs as the next layer's inputs
self._network[i+1].forward(self._network[i].output)
for i in range(epochs):
#Forward pass
self._network[0].forward_pass(self._inputs)
for i in range(len(self._network)-1):
output = self._network[i+1].forward_pass(self._network[i].output)
#Generates the values for loss function, used for training in multiple passes
#Backbone of backpropagation
loss = neural.evaluate()
#Backward pass
#Somehow find a way to derive the evalaute function on predicted values and target values
error, confidences = [np.e**-x for x in loss]
confidences = [np.e**-x for x in confidences]
error = confidences
for i in range(len(self._network)-1,-1):
error = self._network[i-1].backward(error, self._learning_rate)
print('Epoch %d/%d' % (i+1, epochs))
#Start by putting initial inputs into the input layer
self._network[0].forward(self._testing_data)
for i in range(len(self._network)-1):
#Using the previous layer's outputs as the next layer's inputs
self._network[i+1].forward(self._network[i].output)
print("The network's testing outputs were:", self._network[-1].output)
Backward pass code which runs for each layer
def backward(self, output_error, learning_rate):
#The error of this layer's inputs is equal to its output error multipled by the
#transposed weights of the layer
input_error = np.dot(output_error, self.weights.T)
#The error of the weights in this layer is equal to the transposed matrix of inputs fed into the layer
#multipled by the error of the output from this layer
weights_error = np.dot(self.inputs.T, output_error)
# dBias = output_error
# update parameters
self.weights -= learning_rate * weights_error
self.biases -= learning_rate * output_error
return input_error
Aforementioned softmax function within forward() function of LayerDense
elif self._activation_function.lower() == 'softmax':
#Exponentiate (e to the power of x) values and subtract largest value of layer to prevent overflow
#Afterwards, normalise (put as relative fractions) the output values
#In theory, to get the max value out of each batch, axis should be set to 1 and keepdims should be True
neuron_output = np.exp(neuron_output - np.max(layer_output,axis=0)) / np.sum(np.exp(layer_output),axis=0)
Mentioned SentDex tutorial: https://www.youtube.com/playlist?list=PLQVvvaa0QuDcjD5BAw2DxE6OF2tius3V3
Mentioned TDS article: https://towardsdatascience.com/math-neural-network-from-scratch-in-python-d6da9f29ce65
Source code: https://github.com/NewDeveloper911/Python-Collection/blob/master/neural%20network/nn

RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn for non-neural network implementations

I am working on a project where I am getting some funny error with the automatic differentiator in Pytorch.
I am trying to minimize a function with respect to x values. To do so, I use the code at the bottom of this post. As I understand it, I should be able to make an initial guess, set the requires_grad flag to true and run the forward pass (scores = alpha(Xsamples, model, robustness)) and then get the gradients with scores.backward() and then update my initial guess accordingly with optimizer.step. However, when I try running this I get the following error 'RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn' which I don't understand because I have set my initial guess to requires gradient. I tried looking on forums for help but most of the answers were regarding training neural networks so their fixes did not work in this case. Any guidance on this would be greatly appreciated, thank you.
epoch = 100
learning_rate = 0.01
N = 1
Xsamples = torch.randn(1,2)
Xsamples.requires_grad = True
optimizer = torch.optim.SGD([Xsamples], lr = learning_rate)
for i in range(epoch):
scores = alpha(Xsamples, model, robustness)
scores.backward() #dscore/ dx
optimizer.step()
optimizer.zero_grad()
return Xsamples

LSTM Autoencoder problems

TLDR:
Autoencoder underfits timeseries reconstruction and just predicts average value.
Question Set-up:
Here is a summary of my attempt at a sequence-to-sequence autoencoder. This image was taken from this paper: https://arxiv.org/pdf/1607.00148.pdf
Encoder: Standard LSTM layer. Input sequence is encoded in the final hidden state.
Decoder: LSTM Cell (I think!). Reconstruct the sequence one element at a time, starting with the last element x[N].
Decoder algorithm is as follows for a sequence of length N:
Get Decoder initial hidden state hs[N]: Just use encoder final hidden state.
Reconstruct last element in the sequence: x[N]= w.dot(hs[N]) + b.
Same pattern for other elements: x[i]= w.dot(hs[i]) + b
use x[i] and hs[i] as inputs to LSTMCell to get x[i-1] and hs[i-1]
Minimum Working Example:
Here is my implementation, starting with the encoder:
class SeqEncoderLSTM(nn.Module):
def __init__(self, n_features, latent_size):
super(SeqEncoderLSTM, self).__init__()
self.lstm = nn.LSTM(
n_features,
latent_size,
batch_first=True)
def forward(self, x):
_, hs = self.lstm(x)
return hs
Decoder class:
class SeqDecoderLSTM(nn.Module):
def __init__(self, emb_size, n_features):
super(SeqDecoderLSTM, self).__init__()
self.cell = nn.LSTMCell(n_features, emb_size)
self.dense = nn.Linear(emb_size, n_features)
def forward(self, hs_0, seq_len):
x = torch.tensor([])
# Final hidden and cell state from encoder
hs_i, cs_i = hs_0
# reconstruct first element with encoder output
x_i = self.dense(hs_i)
x = torch.cat([x, x_i])
# reconstruct remaining elements
for i in range(1, seq_len):
hs_i, cs_i = self.cell(x_i, (hs_i, cs_i))
x_i = self.dense(hs_i)
x = torch.cat([x, x_i])
return x
Bringing the two together:
class LSTMEncoderDecoder(nn.Module):
def __init__(self, n_features, emb_size):
super(LSTMEncoderDecoder, self).__init__()
self.n_features = n_features
self.hidden_size = emb_size
self.encoder = SeqEncoderLSTM(n_features, emb_size)
self.decoder = SeqDecoderLSTM(emb_size, n_features)
def forward(self, x):
seq_len = x.shape[1]
hs = self.encoder(x)
hs = tuple([h.squeeze(0) for h in hs])
out = self.decoder(hs, seq_len)
return out.unsqueeze(0)
And here's my training function:
def train_encoder(model, epochs, trainload, testload=None, criterion=nn.MSELoss(), optimizer=optim.Adam, lr=1e-6, reverse=False):
device = 'cuda' if torch.cuda.is_available() else 'cpu'
print(f'Training model on {device}')
model = model.to(device)
opt = optimizer(model.parameters(), lr)
train_loss = []
valid_loss = []
for e in tqdm(range(epochs)):
running_tl = 0
running_vl = 0
for x in trainload:
x = x.to(device).float()
opt.zero_grad()
x_hat = model(x)
if reverse:
x = torch.flip(x, [1])
loss = criterion(x_hat, x)
loss.backward()
opt.step()
running_tl += loss.item()
if testload is not None:
model.eval()
with torch.no_grad():
for x in testload:
x = x.to(device).float()
loss = criterion(model(x), x)
running_vl += loss.item()
valid_loss.append(running_vl / len(testload))
model.train()
train_loss.append(running_tl / len(trainload))
return train_loss, valid_loss
Data:
Large dataset of events scraped from the news (ICEWS). Various categories exist that describe each event. I initially one-hot encoded these variables, expanding the data to 274 dimensions. However, in order to debug the model, I've cut it down to a single sequence that is 14 timesteps long and only contains 5 variables. Here is the sequence I'm trying to overfit:
tensor([[0.5122, 0.0360, 0.7027, 0.0721, 0.1892],
[0.5177, 0.0833, 0.6574, 0.1204, 0.1389],
[0.4643, 0.0364, 0.6242, 0.1576, 0.1818],
[0.4375, 0.0133, 0.5733, 0.1867, 0.2267],
[0.4838, 0.0625, 0.6042, 0.1771, 0.1562],
[0.4804, 0.0175, 0.6798, 0.1053, 0.1974],
[0.5030, 0.0445, 0.6712, 0.1438, 0.1404],
[0.4987, 0.0490, 0.6699, 0.1536, 0.1275],
[0.4898, 0.0388, 0.6704, 0.1330, 0.1579],
[0.4711, 0.0390, 0.5877, 0.1532, 0.2201],
[0.4627, 0.0484, 0.5269, 0.1882, 0.2366],
[0.5043, 0.0807, 0.6646, 0.1429, 0.1118],
[0.4852, 0.0606, 0.6364, 0.1515, 0.1515],
[0.5279, 0.0629, 0.6886, 0.1514, 0.0971]], dtype=torch.float64)
And here is the custom Dataset class:
class TimeseriesDataSet(Dataset):
def __init__(self, data, window, n_features, overlap=0):
super().__init__()
if isinstance(data, (np.ndarray)):
data = torch.tensor(data)
elif isinstance(data, (pd.Series, pd.DataFrame)):
data = torch.tensor(data.copy().to_numpy())
else:
raise TypeError(f"Data should be ndarray, series or dataframe. Found {type(data)}.")
self.n_features = n_features
self.seqs = torch.split(data, window)
def __len__(self):
return len(self.seqs)
def __getitem__(self, idx):
try:
return self.seqs[idx].view(-1, self.n_features)
except TypeError:
raise TypeError("Dataset only accepts integer index/slices, not lists/arrays.")
Problem:
The model only learns the average, no matter how complex I make the model or now long I train it.
Predicted/Reconstruction:
Actual:
My research:
This problem is identical to the one discussed in this question: LSTM autoencoder always returns the average of the input sequence
The problem in that case ended up being that the objective function was averaging the target timeseries before calculating loss. This was due to some broadcasting errors because the author didn't have the right sized inputs to the objective function.
In my case, I do not see this being the issue. I have checked and double checked that all of my dimensions/sizes line up. I am at a loss.
Other Things I've Tried
I've tried this with varied sequence lengths from 7 timesteps to 100 time steps.
I've tried with varied number of variables in the time series. I've tried with univariate all the way to all 274 variables that the data contains.
I've tried with various reduction parameters on the nn.MSELoss module. The paper calls for sum, but I've tried both sum and mean. No difference.
The paper calls for reconstructing the sequence in reverse order (see graphic above). I have tried this method using the flipud on the original input (after training but before calculating loss). This makes no difference.
I tried making the model more complex by adding an extra LSTM layer in the encoder.
I've tried playing with the latent space. I've tried from 50% of the input number of features to 150%.
I've tried overfitting a single sequence (provided in the Data section above).
Question:
What is causing my model to predict the average and how do I fix it?
Okay, after some debugging I think I know the reasons.
TLDR
You try to predict next timestep value instead of difference between current timestep and the previous one
Your hidden_features number is too small making the model unable to fit even a single sample
Analysis
Code used
Let's start with the code (model is the same):
import seaborn as sns
import matplotlib.pyplot as plt
def get_data(subtract: bool = False):
# (1, 14, 5)
input_tensor = torch.tensor(
[
[0.5122, 0.0360, 0.7027, 0.0721, 0.1892],
[0.5177, 0.0833, 0.6574, 0.1204, 0.1389],
[0.4643, 0.0364, 0.6242, 0.1576, 0.1818],
[0.4375, 0.0133, 0.5733, 0.1867, 0.2267],
[0.4838, 0.0625, 0.6042, 0.1771, 0.1562],
[0.4804, 0.0175, 0.6798, 0.1053, 0.1974],
[0.5030, 0.0445, 0.6712, 0.1438, 0.1404],
[0.4987, 0.0490, 0.6699, 0.1536, 0.1275],
[0.4898, 0.0388, 0.6704, 0.1330, 0.1579],
[0.4711, 0.0390, 0.5877, 0.1532, 0.2201],
[0.4627, 0.0484, 0.5269, 0.1882, 0.2366],
[0.5043, 0.0807, 0.6646, 0.1429, 0.1118],
[0.4852, 0.0606, 0.6364, 0.1515, 0.1515],
[0.5279, 0.0629, 0.6886, 0.1514, 0.0971],
]
).unsqueeze(0)
if subtract:
initial_values = input_tensor[:, 0, :]
input_tensor -= torch.roll(input_tensor, 1, 1)
input_tensor[:, 0, :] = initial_values
return input_tensor
if __name__ == "__main__":
torch.manual_seed(0)
HIDDEN_SIZE = 10
SUBTRACT = False
input_tensor = get_data(SUBTRACT)
model = LSTMEncoderDecoder(input_tensor.shape[-1], HIDDEN_SIZE)
optimizer = torch.optim.Adam(model.parameters())
criterion = torch.nn.MSELoss()
for i in range(1000):
outputs = model(input_tensor)
loss = criterion(outputs, input_tensor)
loss.backward()
optimizer.step()
optimizer.zero_grad()
print(f"{i}: {loss}")
if loss < 1e-4:
break
# Plotting
sns.lineplot(data=outputs.detach().numpy().squeeze())
sns.lineplot(data=input_tensor.detach().numpy().squeeze())
plt.show()
What it does:
get_data either works on the data your provided if subtract=False or (if subtract=True) it subtracts value of the previous timestep from the current timestep
Rest of the code optimizes the model until 1e-4 loss reached (so we can compare how model's capacity and it's increase helps and what happens when we use the difference of timesteps instead of timesteps)
We will only vary HIDDEN_SIZE and SUBTRACT parameters!
NO SUBTRACT, SMALL MODEL
HIDDEN_SIZE=5
SUBTRACT=False
In this case we get a straight line. Model is unable to fit and grasp the phenomena presented in the data (hence flat lines you mentioned).
1000 iterations limit reached
SUBTRACT, SMALL MODEL
HIDDEN_SIZE=5
SUBTRACT=True
Targets are now far from flat lines, but model is unable to fit due to too small capacity.
1000 iterations limit reached
NO SUBTRACT, LARGER MODEL
HIDDEN_SIZE=100
SUBTRACT=False
It got a lot better and our target was hit after 942 steps. No more flat lines, model capacity seems quite fine (for this single example!)
SUBTRACT, LARGER MODEL
HIDDEN_SIZE=100
SUBTRACT=True
Although the graph does not look that pretty, we got to desired loss after only 215 iterations.
Finally
Usually use difference of timesteps instead of timesteps (or some other transformation, see here for more info about that). In other cases, neural network will try to simply... copy output from the previous step (as that's the easiest thing to do). Some minima will be found this way and going out of it will require more capacity.
When you use the difference between timesteps there is no way to "extrapolate" the trend from previous timestep; neural network has to learn how the function actually varies
Use larger model (for the whole dataset you should try something like 300 I think), but you can simply tune that one.
Don't use flipud. Use bidirectional LSTMs, in this way you can get info from forward and backward pass of LSTM (not to confuse with backprop!). This also should boost your score
Questions
Okay, question 1: You are saying that for variable x in the time
series, I should train the model to learn x[i] - x[i-1] rather than
the value of x[i]? Am I correctly interpreting?
Yes, exactly. Difference removes the urge of the neural network to base it's predictions on the past timestep too much (by simply getting last value and maybe changing it a little)
Question 2: You said my calculations for zero bottleneck were
incorrect. But, for example, let's say I'm using a simple dense
network as an auto encoder. Getting the right bottleneck indeed
depends on the data. But if you make the bottleneck the same size as
the input, you get the identity function.
Yes, assuming that there is no non-linearity involved which makes the thing harder (see here for similar case). In case of LSTMs there are non-linearites, that's one point.
Another one is that we are accumulating timesteps into single encoder state. So essentially we would have to accumulate timesteps identities into a single hidden and cell states which is highly unlikely.
One last point, depending on the length of sequence, LSTMs are prone to forgetting some of the least relevant information (that's what they were designed to do, not only to remember everything), hence even more unlikely.
Is num_features * num_timesteps not a bottle neck of the same size as
the input, and therefore shouldn't it facilitate the model learning
the identity?
It is, but it assumes you have num_timesteps for each data point, which is rarely the case, might be here. About the identity and why it is hard to do with non-linearities for the network it was answered above.
One last point, about identity functions; if they were actually easy to learn, ResNets architectures would be unlikely to succeed. Network could converge to identity and make "small fixes" to the output without it, which is not the case.
I'm curious about the statement : "always use difference of timesteps
instead of timesteps" It seem to have some normalizing effect by
bringing all the features closer together but I don't understand why
this is key ? Having a larger model seemed to be the solution and the
substract is just helping.
Key here was, indeed, increasing model capacity. Subtraction trick depends on the data really. Let's imagine an extreme situation:
We have 100 timesteps, single feature
Initial timestep value is 10000
Other timestep values vary by 1 at most
What the neural network would do (what is the easiest here)? It would, probably, discard this 1 or smaller change as noise and just predict 1000 for all of them (especially if some regularization is in place), as being off by 1/1000 is not much.
What if we subtract? Whole neural network loss is in the [0, 1] margin for each timestep instead of [0, 1001], hence it is more severe to be wrong.
And yes, it is connected to normalization in some sense come to think about it.

What is the state of the art way of doing regression with probability in pytorch

All regression examples I find are examples where you predict a real number and unlike with classification you dont the the confidence the model had when predicting that number. I have done in reinforcement learning another way the output is instead the mean and std and then you sample from that distribution. Then you know how confident the model is at predicting every value. Now I cant find how to do this using supervised learning in pytorch. The problem is that I dont understand how to perform sample from the distribution the get the actual value while training or what sort of loss function I should use, not sure how for example MSE or L1Smooth would work.
Is there any example ot there where this is done in pytorch in a robust and state of the art way?
The key point is that you do not need to sample from the NN-produced distribution. All you need is to optimize the likelihood of the target value under the NN distribution.
There is an example in the official PyTorch example on VAE (https://github.com/pytorch/examples/tree/master/vae), though for multidimensional Bernoulli distribution.
Since PyTorch 0.4, you can use torch.distributions: instantiate distribution distro with outputs of your NN and then optimize -distro.log_prob(target).
EDIT: As requested in a comment, a complete example of using the torch.distributions module.
First, we create a heteroscedastic dataset:
import numpy as np
import torch
X = np.random.uniform(size=300)
Y = X + 0.25*X*np.random.normal(size=X.shape[0])
We build a trivial model, which is perfectly able to match the generative process of our data:
class Model(torch.nn.Module):
def __init__(self):
super().__init__()
self.mean_coeff = torch.nn.Parameter(torch.Tensor([0]))
self.var_coeff = torch.nn.Parameter(torch.Tensor([1]))
def forward(self, x):
return torch.distributions.Normal(self.mean_coeff * x, self.var_coeff * x)
mdl = Model()
optim = torch.optim.SGD(mdl.parameters(), lr=1e-3)
Initialization of the model makes it always produce a standard normal, which is a poor fit for our data, so we train (note it is a very stupid batch training, but demonstrates that you can output a set of distributions for your batch at once):
for _ in range(2000): # epochs
dist = mdl(torch.from_numpy(X).float())
obj = -dist.log_prob(torch.from_numpy(Y).float()).mean()
optim.zero_grad()
obj.backward()
optim.step()
Eventually, the learned parameters should match the values we used to construct the Y.
print(mdl.mean_coeff, mdl.var_coeff)
# tensor(1.0150) tensor(0.2597)

How to plot a ROC curve with Tensorflow and scikit-learn?

I'm trying to plot the ROC curve from a modified version of the CIFAR-10 example provided by tensorflow. It's now for 2 classes instead of 10.
The output of the network are called logits and take the form:
[[-2.57313061 2.57966399] [ 0.04221377 -0.04033273] [-1.42880082
1.43337202] [-2.7692945 2.78173304] [-2.48195744 2.49331546] [ 2.0941515 -2.10268974] [-3.51670194 3.53267646] [-2.74760485 2.75617766] ...]
First of all, what do these logits actually represent? The final layer in the network is a "softmax linear" of form WX+b.
The model is able to calculate accuracy by calling
top_k_op = tf.nn.in_top_k(logits, labels, 1)
Then once the graph has been initialized:
predictions = sess.run([top_k_op])
predictions_int = np.array(predictions).astype(int)
true_count += np.sum(predictions)
...
precision = true_count / total_sample_count
This works fine.
But now how can I plot a ROC curve from this?
I've been trying the "sklearn.metrics.roc_curve()" function (http://scikit-learn.org/stable/modules/generated/sklearn.metrics.roc_curve.html#sklearn.metrics.roc_curve) but I don't know what to use as my "y_score" parameter.
Any help would be appreciated!
'y_score' here should be an array corresponding to the probability of each sample that will be classified as positive (if positive was labeled as 1 in your y_true array)
Actually, if your network use Softmax as the last layer, then the model should output the probability of each category for this instance. But the data you given here doesn't conform with this format. And I checked the example code : https://github.com/tensorflow/tensorflow/blob/r0.10/tensorflow/models/image/cifar10/cifar10.py
it seems use the layer called softmax_linear, I know little for this Example but I guess you should process the output with something like Logistic Function to turn it into the probability.
Then just feed it along with your true label 'y_true' to the scikit-learn function:
y_score = np.array(output)[:,1]
roc_curve(y_true, y_score)
import tensorflow as tf
tp = [] # the true positive rate list
fp = [] # the false positive rate list
total = len(fp)
writer = tf.train.SummaryWriter("/tmp/tensorboard_roc")
for idx in range(total):
summt = tf.Summary()
summt.value.add(tag="roc", simple_value = tp[idx])
writer.add_summary (summt, tp[idx] * 100) #act as global_step
writer.flush ()
then start a tensorboard:
tensorboard --logdir=/tmp/tensorboard_roc
tensorboard_roc
for details and code, you can visit my blog: http://blog.csdn.net/mao_feng/article/details/54731098

Categories

Resources