For example i want to train tensorflow model which has 2 outputs. If first output is 1 then i look at the second output, but if the first output is 0 then second output doesn't matter. Is there a way in tensorflow to set the error on second output to 0 when the first output is 0 or I have to specify all the outputs. Sorry if that is dumb question but I'm new to tensorflow.
Better example. I want to check if there is a dot in the feeded image. My model has 5 outputs. First one predict if there is a dot in the image(values from 0 to 1).The next 4 outputs shows where is that dot in the image (position, width and height). So if I feed a model with an image without dot what should i put in the output. [0,anything,anything,anything,anything] or [0,0,0,0,0]. And if the first one, how to do it.
You need to define your loss so that it does not consider the second portion of output if the first one is zero. If the image does not have a dot in it you may use an arbitrary number for the last 4 numbers, because if the first number is zero they are not considered in your loss. I hope this helps.
More readings: Object detection and the idea of anchor boxes in the Faster-RNN paper may help to understand how this might work: https://arxiv.org/pdf/1506.01497.pdf
Related
I am currently doing text generation in tensorflow. After training the model, when predicting the text and decoding the numbers to text why we used tf.random.categorical(predictions, num_samples=1)[-1,0].numpy??
1.: tf.random.categorical just returns one of the argument numbers (not the argument itself) that have the greatest probability, based on the probability distribution that it takes as an argument - predictions in this case. It only returns one since we set num_samples=1.
(For a better answer on what exactly tf.random.categorical does, look here.)
2.: [-1,0] is simple index slicing, we just take the first element of the last element.
We take it from the last element since the output is always just the input offset by one position, hence the last element is the new word - and that's what we're searching for in this case.
3.: And numpy() is being used because we don't (usually) want to deal with tensors at this point. The return of tf.random.categorical is a tensor. So we convert it using numpy().
For one of my first attempts at using Tensor flow I've followed the Binary Image Classification tutorial https://www.tensorflow.org/tutorials/keras/text_classification_with_hub#evaluate_the_model.
I was able to follow the tutorial fine, but then I wanted to try to inspect the results more closely, namely I wanted to see what predictions the model made for each item in the test data set.
In short, I wanted to see what "label" (1 or 0) it would predict applies to a given movie review.
So I tried:
results = model.predict(test_data.batch(512))
and then
for i in results:
print(i)
This gives me close to what I would expect. A list of 25,000 entries (one for each movie review).
But the value of each item in the array is not what I would expect. I was expecting to see a predicted label, so either a 0 (for negative) or 1 (for positive).
But instead I get this:
[0.22731477]
[2.1199656]
[-2.2581818]
[-2.7382329]
[3.8788114]
[4.6112833]
[6.125982]
[5.100685]
[1.1270659]
[1.3210837]
[-5.2568426]
[-2.9904163]
[0.17620209]
[-1.1293088]
[2.8757455]
...and so on for 25,000 entries.
Can someone help me understand what these numbers mean.
Am I misunderstanding what the "predict" method does, or (since these number look similar to the word embedding vectors introduced in the first layer of the model) perhaps I am misunderstanding how the prediction relates to the word embedding layer and the ultimate classification label.
I know this a major newbie question. But appreciate your help and patience :)
According to the link that you provided, the problem come from your output activation function. That code use dense vector with 1 neuron without activation function. So it just multiplying output from previous layer with weight and bias and sum them together. The output that you get will have a range between -infinity(negative class) and +infinity(positive class), Therefore if you really want your output between zero and one you need an activation function such as sigmoid model.add(tf.keras.layers.Dense(1), activation='sigmoid'). Now we just map every thing to range 0 to 1, so we can classify as negative class if output is less than 0.5(mid point) and vice versa.
Actually your understanding of prediction function is correct. You simply did not add an activation to fit with your assumption, that's why you gat that output instead of value between 0 and 1.
I'm having problems implementing a simple balancing for an H2ORandomForestEstimator, I'm trying to reproduce a simple example found in Darren Cook's book written in R ('Practical Machine Learning with H2O - pag. 107).
Working on the Iris Dataset, firstly I artificially unbalance the target variable cutting out a good share of virginica keeping first 120 rows.
Then I build 3 models, a vanilla one, one where I set balance_classes as True, and a last one where I set balance_classes as True and I input a list for class_sampling_factors to oversample the virginica one. List is [1.0,1.0,2.5], referred to columns sorted alphabetically.
I train them, and then output confusion matrix for train for each one.
I'm expecting an unbalanced output for the first one, and a balanced one for the last two, while I have always the same result. I checked the documentation example in Python, and I can't see anything wrong (I may be tired as well).
This is my code:
data_unb = data[1:120,:] # messing up with target variable
train, valid = data_unb.split_frame([0.8], seed=12345)
m1 = h2o.estimators.random_forest.H2ORandomForestEstimator(seed=12345)
m2 = h2o.estimators.random_forest.H2ORandomForestEstimator(balance_classes=True, seed=12345)
m3 = h2o.estimators.random_forest.H2ORandomForestEstimator(balance_classes=True, class_sampling_factors=[1.0,1.0,2.5], seed=12345)
m1.train(x=list(range(4)),y=4,training_frame=train,validation_frame=valid,model_id='RF_defaults')
m2.train(x=list(range(4)),y=4,training_frame=train,validation_frame=valid,model_id='RF_balanced')
m3.train(x=list(range(4)),y=4,training_frame=train,validation_frame=valid,model_id='RF_class_sampling',)
m1.confusion_matrix(train)
m2.confusion_matrix(train)
m3.confusion_matrix(train)
This is my output:
my confusion matrices (wrong)
this is my expected output.
expected confusion matrices
What am I evidently missing? Thanks in advance.
You're not missing anything. The offset_column is available in H2O Random Forest, but it's not actually functional. The bug is documented here and should be fixed in the next stable release of H2O. Sorry about the confusion!
It should work for the rest of the H2O algos (except XGBoost). If you wanted to try on a GBM, for example, you'd see it working.
I am using the python API of CNTK to train some CNN that I save using the save_model function.
Now I want to run some analysis on my network afterwards. Specifically I want to take a look at the activations of each layer. Obviously I can run my network on some data called img like this:
model.eval(img)
But that will only give me the output of the last Layer in my Network. Is there some easy way to also get the output from the previous layers?
Actually, there is even an example provided for that task: https://github.com/Microsoft/CNTK/tree/master/Examples/Image/FeatureExtraction
Let me give you a short overview about the essential steps:
Important is the name of your node, of which you want to get the output.
# get the node in the graph of which you desire the output
node_in_graph = loaded_model.find_by_name(node_name)
output_nodes = combine([node_in_graph.owner])
# evaluate the node e.g. using a minibatch_source
mb = minibatch_source.next_minibatch(1)
output = output_nodes.eval(mb[features_si])
# access the values as a one dimensional vector
out_values = output[0].flatten()
desired_output = out_values[np.newaxis]
Basically you just do the same like you do anyways with the difference that you retrieve an intermediate node.
With CNTK I have created a network with 2 input neurons and 1 output neuron.
A line in the training file looks like
|features 1.567518 2.609619 |labels 1.000000
Then the network was trained with brain script. Now I want to use the network for predicting values. For example: Input data is [1.82, 3.57]. What ist the output from the net?
I have tried Python with the following code, but here I am new. Code does not work. So my question is: How to pass the input data [1.82, 3.57] to the eval function?
On stackoverflow there are some hints, here and here, but this is too abstract for me.
Thank you.
import cntk as ct
import numpy as np
z = ct.load_model("LR_reg.dnn", ct.device.cpu())
input_data= np.array([1.82, 3.57], dtype=np.float32)
pred = z.eval({ z.arguments[0] : input_data })
print(pred)
Here's the most defensive way of doing it. CNTK can be forgiving if you omit some of this when the network is specified with V2 constructs. Not sure about a network that was created with V1 code.
Basically you need a pair of braces for each axis. Which axes exist in Brainscript? There's a batch axis, a sequence axis and then the static axes of your network. You have one dimensional data so that means the following should work:
input_data= np.array([[[1.82, 3.57]]], dtype=np.float32)
This specifies a batch of one sequence, of length one, containing one 1d vector of two elements. You can also try omitting the outermost braces and see if you are getting the same result.
Update based on more information from the comment below, we should not forget that the V1 code also saved the part of the network that computes things like loss and accuracy. If we provide only the features, CNTK will complain that the labels have not been provided. There are two ways to deal with this issue. One possibility is to provide some fake labels, so that the network can evaluate these auxiliary operations. Another possibility is to identify the prediction and use that. If the prediction was called 'p' in V1, this python code
p = z.find_by_name('p')
should create a CNTK function that only needs the features in order to compute the prediction.