I am using the python API of CNTK to train some CNN that I save using the save_model function.
Now I want to run some analysis on my network afterwards. Specifically I want to take a look at the activations of each layer. Obviously I can run my network on some data called img like this:
model.eval(img)
But that will only give me the output of the last Layer in my Network. Is there some easy way to also get the output from the previous layers?
Actually, there is even an example provided for that task: https://github.com/Microsoft/CNTK/tree/master/Examples/Image/FeatureExtraction
Let me give you a short overview about the essential steps:
Important is the name of your node, of which you want to get the output.
# get the node in the graph of which you desire the output
node_in_graph = loaded_model.find_by_name(node_name)
output_nodes = combine([node_in_graph.owner])
# evaluate the node e.g. using a minibatch_source
mb = minibatch_source.next_minibatch(1)
output = output_nodes.eval(mb[features_si])
# access the values as a one dimensional vector
out_values = output[0].flatten()
desired_output = out_values[np.newaxis]
Basically you just do the same like you do anyways with the difference that you retrieve an intermediate node.
Related
For one of my first attempts at using Tensor flow I've followed the Binary Image Classification tutorial https://www.tensorflow.org/tutorials/keras/text_classification_with_hub#evaluate_the_model.
I was able to follow the tutorial fine, but then I wanted to try to inspect the results more closely, namely I wanted to see what predictions the model made for each item in the test data set.
In short, I wanted to see what "label" (1 or 0) it would predict applies to a given movie review.
So I tried:
results = model.predict(test_data.batch(512))
and then
for i in results:
print(i)
This gives me close to what I would expect. A list of 25,000 entries (one for each movie review).
But the value of each item in the array is not what I would expect. I was expecting to see a predicted label, so either a 0 (for negative) or 1 (for positive).
But instead I get this:
[0.22731477]
[2.1199656]
[-2.2581818]
[-2.7382329]
[3.8788114]
[4.6112833]
[6.125982]
[5.100685]
[1.1270659]
[1.3210837]
[-5.2568426]
[-2.9904163]
[0.17620209]
[-1.1293088]
[2.8757455]
...and so on for 25,000 entries.
Can someone help me understand what these numbers mean.
Am I misunderstanding what the "predict" method does, or (since these number look similar to the word embedding vectors introduced in the first layer of the model) perhaps I am misunderstanding how the prediction relates to the word embedding layer and the ultimate classification label.
I know this a major newbie question. But appreciate your help and patience :)
According to the link that you provided, the problem come from your output activation function. That code use dense vector with 1 neuron without activation function. So it just multiplying output from previous layer with weight and bias and sum them together. The output that you get will have a range between -infinity(negative class) and +infinity(positive class), Therefore if you really want your output between zero and one you need an activation function such as sigmoid model.add(tf.keras.layers.Dense(1), activation='sigmoid'). Now we just map every thing to range 0 to 1, so we can classify as negative class if output is less than 0.5(mid point) and vice versa.
Actually your understanding of prediction function is correct. You simply did not add an activation to fit with your assumption, that's why you gat that output instead of value between 0 and 1.
I have two models (model_a,model_b) of similar structure (VGG16 architecture with a replaced top block). I need to concatenate the outputs of the last layers of both models in order to send as input to an attention mechanism.
I run the following line of code for concatenation:
merged = Concatenate()([model_a.layers[-1].layers[-1], model_b.layers[-1].layers[-1]])
(model_a.layers[-1] is the top block which is a Sequential object, model_a.layers[-1].layers[-1] is a Dense layer.)
However, I receive the following error when I try to do so:
Layer concatenate_8 was called with an input that isn't a symbolic
tensor. Received type: < class 'keras.layers.core.Dense' >. Full input:
[< keras.layers.core.Dense object at 0x >,
< keras.layers.core.Dense object at 0x >]. All inputs to the
layer should be tensors.
I noticed that similar issues are fixed by redefining the last layer by specifying the input layer for it, but I'm not sure how that solution would help here since I'm using predefined and pre-trained models.
Use the .output attribute of the model to access the symbolic tensor.
merged = Concatenate()(
[model_a.layers[-1].layers[-1].output, model_b.layers[-1].layers[-1].output])
The tensorflow.keras.layers.Layer documentation states the following about Layer.output
output: Retrieves the output tensor(s) of a layer.
Only applicable if the layer has exactly one output, i.e. if it is connected to one incoming layer.
Regarding the comment
Actually, when I tried to finally combine inputs and outputs using the Model() method, I had an error - Graph disconnected: Cannot get value of tensor Tensor Flatten.. , however when I changed model_a.layers[-1].layers[-1].output to model_a.output and similarly for model_b, the issue was resolved. Any idea why so?
It's difficult to say without seeing the model code, but you can compare the values of model_a.layers[-1].layers[-1] to model_a.output.
I currently have a keras model which uses an Embedding layer. Something like this:
input = tf.keras.layers.Input(shape=(20,) dtype='int32')
x = tf.keras.layers.Embedding(input_dim=1000,
output_dim=50,
input_length=20,
trainable=True,
embeddings_initializer='glorot_uniform',
mask_zero=False)(input)
This is great and works as expected. However, I want to be able to send text to my model, have it preprocess the text into integers, and continue normally.
Two issues:
1) The Keras docs say that Embedding layers can only be used as the first layer in a model: https://keras.io/layers/embeddings/
2) Even if I could add a Lambda layer before the Embedding, I'd need it to keep track of certain state (like a dictionary mapping specific words to integers). How might I go about this stateful preprocessing?
In short, I need to modify the underlying Tensorflow DAG, so when I save my model and upload to ML Engine, it'll be able to handle my sending it raw text.
Thanks!
Here are the first few layers of a model which uses a string input:
input = keras.layers.Input(shape=(1,), dtype="string", name='input_1')
lookup_table_op = tf.contrib.lookup.index_table_from_tensor(
mapping=vocab_list,
num_oov_buckets=num_oov_buckets,
default_value=-1,
)
lambda_output = Lambda(lookup_table_op.lookup)(input)
emb_layer = Embedding(int(number_of_categories),int(number_of_categories**0.25))(lambda_output)
Then you can continue the model as you normally would after an embedding layer. This is working for me and the model trains fine from string inputs.
It is recommended that you do the string -> int conversion in some preprocessing step to speed up the training process. Then after the model is trained you create a second keras model that just converts string -> int and then combine the two models to get the full string -> target model.
I am using the new tf.estimator.WarmStartSettings to initialize my network from a previous checkpoint. I now want to run the same network on a new data source, with other vocabs to use for the embeddings.
This snippet from the documentation page of WarmStartSettings seems to describe my use case:
Warm-start all weights but the embedding parameters corresponding to
sc_vocab_file have a different vocab from the one used in the current
model:
vocab_info = ws_util.VocabInfo(
new_vocab=sc_vocab_file.vocabulary_file,
new_vocab_size=sc_vocab_file.vocabulary_size,
num_oov_buckets=sc_vocab_file.num_oov_buckets,
old_vocab="old_vocab.txt"
)
ws = WarmStartSettings(
ckpt_to_initialize_from="/tmp",
var_name_to_vocab_info={
"input_layer/sc_vocab_file_embedding/embedding_weights": vocab_info
})
tf.estimator.VocabInfo allows to specify the old and new vocab with their respective sizes. However, when I try to use the WarmStartSettings as shown above with 2 vocabs of different sizes, I get the following error:
ValueError: Shape of variable input_layer/sc_vocab_file_embedding/embedding_weights
((1887, 30)) doesn't match with shape of tensor
input_layer/sc_vocab_file_embedding/embedding_weights ([537, 30]) from checkpoint reader.
Why does VocabInfo allow to provide separate sizes for the vocabs if their size has to match anyway?
With CNTK I have created a network with 2 input neurons and 1 output neuron.
A line in the training file looks like
|features 1.567518 2.609619 |labels 1.000000
Then the network was trained with brain script. Now I want to use the network for predicting values. For example: Input data is [1.82, 3.57]. What ist the output from the net?
I have tried Python with the following code, but here I am new. Code does not work. So my question is: How to pass the input data [1.82, 3.57] to the eval function?
On stackoverflow there are some hints, here and here, but this is too abstract for me.
Thank you.
import cntk as ct
import numpy as np
z = ct.load_model("LR_reg.dnn", ct.device.cpu())
input_data= np.array([1.82, 3.57], dtype=np.float32)
pred = z.eval({ z.arguments[0] : input_data })
print(pred)
Here's the most defensive way of doing it. CNTK can be forgiving if you omit some of this when the network is specified with V2 constructs. Not sure about a network that was created with V1 code.
Basically you need a pair of braces for each axis. Which axes exist in Brainscript? There's a batch axis, a sequence axis and then the static axes of your network. You have one dimensional data so that means the following should work:
input_data= np.array([[[1.82, 3.57]]], dtype=np.float32)
This specifies a batch of one sequence, of length one, containing one 1d vector of two elements. You can also try omitting the outermost braces and see if you are getting the same result.
Update based on more information from the comment below, we should not forget that the V1 code also saved the part of the network that computes things like loss and accuracy. If we provide only the features, CNTK will complain that the labels have not been provided. There are two ways to deal with this issue. One possibility is to provide some fake labels, so that the network can evaluate these auxiliary operations. Another possibility is to identify the prediction and use that. If the prediction was called 'p' in V1, this python code
p = z.find_by_name('p')
should create a CNTK function that only needs the features in order to compute the prediction.