understanding tensorflow binary image classification results - python

For one of my first attempts at using Tensor flow I've followed the Binary Image Classification tutorial https://www.tensorflow.org/tutorials/keras/text_classification_with_hub#evaluate_the_model.
I was able to follow the tutorial fine, but then I wanted to try to inspect the results more closely, namely I wanted to see what predictions the model made for each item in the test data set.
In short, I wanted to see what "label" (1 or 0) it would predict applies to a given movie review.
So I tried:
results = model.predict(test_data.batch(512))
and then
for i in results:
print(i)
This gives me close to what I would expect. A list of 25,000 entries (one for each movie review).
But the value of each item in the array is not what I would expect. I was expecting to see a predicted label, so either a 0 (for negative) or 1 (for positive).
But instead I get this:
[0.22731477]
[2.1199656]
[-2.2581818]
[-2.7382329]
[3.8788114]
[4.6112833]
[6.125982]
[5.100685]
[1.1270659]
[1.3210837]
[-5.2568426]
[-2.9904163]
[0.17620209]
[-1.1293088]
[2.8757455]
...and so on for 25,000 entries.
Can someone help me understand what these numbers mean.
Am I misunderstanding what the "predict" method does, or (since these number look similar to the word embedding vectors introduced in the first layer of the model) perhaps I am misunderstanding how the prediction relates to the word embedding layer and the ultimate classification label.
I know this a major newbie question. But appreciate your help and patience :)

According to the link that you provided, the problem come from your output activation function. That code use dense vector with 1 neuron without activation function. So it just multiplying output from previous layer with weight and bias and sum them together. The output that you get will have a range between -infinity(negative class) and +infinity(positive class), Therefore if you really want your output between zero and one you need an activation function such as sigmoid model.add(tf.keras.layers.Dense(1), activation='sigmoid'). Now we just map every thing to range 0 to 1, so we can classify as negative class if output is less than 0.5(mid point) and vice versa.
Actually your understanding of prediction function is correct. You simply did not add an activation to fit with your assumption, that's why you gat that output instead of value between 0 and 1.

Related

In Tensorflow, why add an activation function to a model only when preparing to export it?

In the Tensorflow ML Basics with Keras tutorial for making a basic text classification, when preparing the trained model for export, the tutorial suggests including the TextVectorization layer into the Model so it can "process raw strings". I understand why to do this.
But then the code snippet is:
export_model = tf.keras.Sequential([
vectorize_layer,
model,
layers.Activation('sigmoid')
])
Why when preparing the model for export, does the tutorial also include a new activation layer layers.Activation('sigmoid')? Why not incorporate this layer into the original model?
Before the TextVectorization layer was introduced, you had to manually edit your raw strings. This usually meant removing punctuation, lower case, tokenization and so forth:
#Raw String
"Furthermore, he asked himself why it happened to Billy?"
#Remove punctuation
"Furthermore he asked himself why it happened to Billy"
#Lower-case
"furthermore he asked himself why it happened to billy"
#Tokenize
['furthermore', 'he', 'asked', 'himself', 'why', 'it', 'happened', 'to', 'billy']
If you include the TextVectorization layer in your model when you export, you can essentially feed raw strings into your model for prediction without having to clean them up first.
Regarding your second question: I also find it rather odd that the sigmoid activation function was not used. I imagine that the last layer has a "linear activation function" due to the dataset and its samples. The samples can be split into two classes, solving a linearly separable problem.
The problem with a linear activation function during inference is that it can output negative values:
# With linear activation function
examples = [
"The movie was great!",
"The movie was okay.",
"The movie was terrible..."
]
export_model.predict(examples)
'''
array([[ 0.4543204 ],
[-0.26730654],
[-0.61234593]], dtype=float32)
'''
For example, the value -0.26730654 could indicate that the review "The movie was okay." is negative, but this is not necessarily the case. What one actually wants to predict is the probability that a particular sample belongs to a particular class. Therefore, a sigmoid function is used in the inference to squeeze the output values between 0 and 1. The output can then be interpreted as the probability that sample x belongs to class n:
# With sigmoid activation function
examples = [
"The movie was great!",
"The movie was okay.",
"The movie was terrible..."
]
export_model.predict(examples)
'''
array([[0.6116659 ],
[0.43356845],
[0.35152423]], dtype=float32)
'''
Sometimes you want to know model's answer before sigmoid as it may contain useful information, for example, about distribution shape and its evolution. In such scenario it's convenient to have final scaling as a separate entity. Otherwise one would have to remove/add sigmoid layer - more lines of code, more possible erros. So it may be a good practice to apply sigmoid in the very end - just before saving/exporting. Or just an agreement.

Text Generation in Deep Learning Tensorflow

I am currently doing text generation in tensorflow. After training the model, when predicting the text and decoding the numbers to text why we used tf.random.categorical(predictions, num_samples=1)[-1,0].numpy??
1.: tf.random.categorical just returns one of the argument numbers (not the argument itself) that have the greatest probability, based on the probability distribution that it takes as an argument - predictions in this case. It only returns one since we set num_samples=1.
(For a better answer on what exactly tf.random.categorical does, look here.)
2.: [-1,0] is simple index slicing, we just take the first element of the last element.
We take it from the last element since the output is always just the input offset by one position, hence the last element is the new word - and that's what we're searching for in this case.
3.: And numpy() is being used because we don't (usually) want to deal with tensors at this point. The return of tf.random.categorical is a tensor. So we convert it using numpy().

How to train tensorflow model when some outputs do not matter?

For example i want to train tensorflow model which has 2 outputs. If first output is 1 then i look at the second output, but if the first output is 0 then second output doesn't matter. Is there a way in tensorflow to set the error on second output to 0 when the first output is 0 or I have to specify all the outputs. Sorry if that is dumb question but I'm new to tensorflow.
Better example. I want to check if there is a dot in the feeded image. My model has 5 outputs. First one predict if there is a dot in the image(values from 0 to 1).The next 4 outputs shows where is that dot in the image (position, width and height). So if I feed a model with an image without dot what should i put in the output. [0,anything,anything,anything,anything] or [0,0,0,0,0]. And if the first one, how to do it.
You need to define your loss so that it does not consider the second portion of output if the first one is zero. If the image does not have a dot in it you may use an arbitrary number for the last 4 numbers, because if the first number is zero they are not considered in your loss. I hope this helps.
More readings: Object detection and the idea of anchor boxes in the Faster-RNN paper may help to understand how this might work: https://arxiv.org/pdf/1506.01497.pdf

h2o python balance classes

I'm having problems implementing a simple balancing for an H2ORandomForestEstimator, I'm trying to reproduce a simple example found in Darren Cook's book written in R ('Practical Machine Learning with H2O - pag. 107).
Working on the Iris Dataset, firstly I artificially unbalance the target variable cutting out a good share of virginica keeping first 120 rows.
Then I build 3 models, a vanilla one, one where I set balance_classes as True, and a last one where I set balance_classes as True and I input a list for class_sampling_factors to oversample the virginica one. List is [1.0,1.0,2.5], referred to columns sorted alphabetically.
I train them, and then output confusion matrix for train for each one.
I'm expecting an unbalanced output for the first one, and a balanced one for the last two, while I have always the same result. I checked the documentation example in Python, and I can't see anything wrong (I may be tired as well).
This is my code:
data_unb = data[1:120,:] # messing up with target variable
train, valid = data_unb.split_frame([0.8], seed=12345)
m1 = h2o.estimators.random_forest.H2ORandomForestEstimator(seed=12345)
m2 = h2o.estimators.random_forest.H2ORandomForestEstimator(balance_classes=True, seed=12345)
m3 = h2o.estimators.random_forest.H2ORandomForestEstimator(balance_classes=True, class_sampling_factors=[1.0,1.0,2.5], seed=12345)
m1.train(x=list(range(4)),y=4,training_frame=train,validation_frame=valid,model_id='RF_defaults')
m2.train(x=list(range(4)),y=4,training_frame=train,validation_frame=valid,model_id='RF_balanced')
m3.train(x=list(range(4)),y=4,training_frame=train,validation_frame=valid,model_id='RF_class_sampling',)
m1.confusion_matrix(train)
m2.confusion_matrix(train)
m3.confusion_matrix(train)
This is my output:
my confusion matrices (wrong)
this is my expected output.
expected confusion matrices
What am I evidently missing? Thanks in advance.
You're not missing anything. The offset_column is available in H2O Random Forest, but it's not actually functional. The bug is documented here and should be fixed in the next stable release of H2O. Sorry about the confusion!
It should work for the rest of the H2O algos (except XGBoost). If you wanted to try on a GBM, for example, you'd see it working.

CNTK & python: How to pass input data to the eval func?

With CNTK I have created a network with 2 input neurons and 1 output neuron.
A line in the training file looks like
|features 1.567518 2.609619 |labels 1.000000
Then the network was trained with brain script. Now I want to use the network for predicting values. For example: Input data is [1.82, 3.57]. What ist the output from the net?
I have tried Python with the following code, but here I am new. Code does not work. So my question is: How to pass the input data [1.82, 3.57] to the eval function?
On stackoverflow there are some hints, here and here, but this is too abstract for me.
Thank you.
import cntk as ct
import numpy as np
z = ct.load_model("LR_reg.dnn", ct.device.cpu())
input_data= np.array([1.82, 3.57], dtype=np.float32)
pred = z.eval({ z.arguments[0] : input_data })
print(pred)
Here's the most defensive way of doing it. CNTK can be forgiving if you omit some of this when the network is specified with V2 constructs. Not sure about a network that was created with V1 code.
Basically you need a pair of braces for each axis. Which axes exist in Brainscript? There's a batch axis, a sequence axis and then the static axes of your network. You have one dimensional data so that means the following should work:
input_data= np.array([[[1.82, 3.57]]], dtype=np.float32)
This specifies a batch of one sequence, of length one, containing one 1d vector of two elements. You can also try omitting the outermost braces and see if you are getting the same result.
Update based on more information from the comment below, we should not forget that the V1 code also saved the part of the network that computes things like loss and accuracy. If we provide only the features, CNTK will complain that the labels have not been provided. There are two ways to deal with this issue. One possibility is to provide some fake labels, so that the network can evaluate these auxiliary operations. Another possibility is to identify the prediction and use that. If the prediction was called 'p' in V1, this python code
p = z.find_by_name('p')
should create a CNTK function that only needs the features in order to compute the prediction.

Categories

Resources