I am relatively new in machine learning especially when it comes to implementing algorithms. I am using python and tensorflow library to implement a neural network to train on a dataset which has about 20 classes. I am able to train and get predictions successfully but I have a question,
Is it possible to get top k classes along with their probabilities using tensorflow instead of just a single prediction?
If it is possible how can this be done? Thanks for your guidance.
Update 01:
I am adding code of what I am doing. So I build a neural network with 3 layers having tanh, sigmoid, & sigmoid respectively as activation functions for the hidden layers and softmax for output layer. The code for training and prediction is as follows:
y_pred = None
with tf.Session() as sess:
sess.run(init)
for epoch in range(training_epochs):
# running the training_epoch numbered epoch
_,cost = sess.run([optimizer,cost_function],feed_dict={X:tr_features,Y:tr_labels})
cost_history = np.append(cost_history,cost)
# predict results based on the trained model
y_pred = sess.run(tf.argmax(y_,1),feed_dict={X: ts_features})
Right now y_pred is a list of class labels for each test example of ts_features. But instead of getting 1 single class label for each test example I am hoping to get top-k predictions for each example each of the k-predictions accompanied by some kind of probability.
Using tf.nn.top_k():
top_k_values, top_k_indices = tf.nn.top_k(predictions, k=k)
If predictions is a vector of probabilities per class (i.e. predictions[i] = prediction probability for class i), then top_k_values will contain the k highest probabilities in predictions, and top_k_indices will contain the indices of these probabilities, i.e. the corresponding classes.
Supposing that in your code, y_ is the vector of predicted probabilities per class:
k = 3 # replace with your value
# Instead of `y_pred`:
y_k_probs, y_k_pred = sess.run(
tf.nn.top_k(y_, k=k), feed_dict={X: ts_features})
Related
While I'm able to understand how to use model.fit(x_train, y_train), I can't figure out how to make predictions on new data using tensorflow's gradient tape. My github repository with runnable code (up to an error) can be found here. What is currently working is that I get the trained model "network_output", however it appears that with gradient tape, argmax is being used on the model itself, where I'm used to model.fit() taking the test data as an input:
network_output = trained_network(input_images,input_number)
preds = np.argmax(network_output, axis=1)
Where "input_images" is an ndarray: (20,3,3,1) and "input_number" is an ndarray: (20,5).
Now I'm taking network_output as the trained model and would like to use it to predict similarly typed data of test_images, and test_number respectively.
The error 'tensorflow.python.framework.ops.EagerTensor' object has no attribute 'predict' here:
predicted_number = network_output.predict(test_images)
Which is because I don't know how to use the tape to make predictions. However once the prediction works I would guess I can compare the resulting "predicted_number" against the "test_number" as would usually be done using the model.fit method.
acc = 0
for i in range(len(test_images)):
if (predicted_number[i] == test_number[i]):
acc += 1
print("Accuracy: ", acc / len(input_images) * 100, "%")
In order to obtain prediction I usually iterate through batches manually like this:
predictions = []
for batch in range(num_batch):
logits = trained_network(x_test[batch * batch_size: (batch + 1) * batch_size], training=False)
# first obtain probabilities
# (if the last layer of the network has no activation, otherwise skip the softmax here)
prob = tf.nn.softmax(logits)
# putting back together predictions for all batches
predictions.extend(tf.argmax(input=prob, axis=1))
If you don't have a lot of data you can skip the loop, this is faster than using predict because you directly invoke the __call__ method of the model:
logits = trained_network(x_test, training=False)
prob = tf.nn.softmax(logits)
predictions = tf.argmax(input=prob, axis=1)
Finally you could also use predict. In this case the batches are handled automatically. It is easier to use when you have lots of data since you don't have to create a loop to interate through batches. The result is a numpy array of predictions. In can be used like this:
predictions = trained_network.predict(x_test) # you can set a batch_size if you want
What you're doing wrong is this part:
network_output = trained_network(input_images,input_number)
predicted_number = network_output.predict(test_images)
You have to call predict directly on your model trained_network.
Hi I'm trying to build a simple neural network with tensorflow, where I give the model the training_data, which contains the standard values and i give it the target_data, which is the result I want it to have if the predicted value is near one of those numbers.
For example, if I give the y_test a value of 3.5, the model would predict and give a number close to 4. So the condition would say it was a lightsmoker. I searched a bit for activation functions and I learned I can't use sigmoid for what I want to do. I'm quite new on this matter. What i've done so far it's by error and trial.
import random
import tensorflow as tf
import numpy as np
training_data=[]
for i in range(0,5):
training_data.append([random.uniform(0,0.2944)])
for i in range(0,5):
training_data.append([random.uniform(0.2944,1.7394)])
for i in range(0,5):
training_data.append([random.uniform(1.7394,3.2394)])
for i in range(0,5):
training_data.append([random.uniform(3.2394,6)])
target_data=[]
for i in range(0,5):
target_data.append([1])
for i in range(0,5):
target_data.append([2])
for i in range(0,5):
target_data.append([3])
for i in range(0,5):
target_data.append([4])
y_test= np.array([100])
model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Dense(len(target_data),input_dim=1,activation='softmax'))
model.add(tf.keras.layers.Dense(1,activation='relu'))
model.compile( loss='mean_squared_error',
optimizer='adam',
metrics=['accuracy'])
training_data = np.asarray(training_data)
target_data = np.asarray(target_data)
model.fit(training_data, target_data, epochs=50, verbose=0)
target_pred= model.predict(y_test)
target_pred=float(target_pred)
print("X=%s, Predicted=%s" % (y_test, target_pred))
if( 0<= target_pred <= 1.5):
print("\nNon-Smoker")
elif(1.5<= target_pred <2.5):
print("\nPassive Smoker")
elif(2.5<= target_pred <3.5 ):
print("Lghtsmoker")
else:
print("Smoker\n")
Here is a helpful guide to using activation functions in the final layer as well as corresponding losses for different type of problems.
In your case, I am assuming you are working with a regression task with arbitrary values (any float value as output, not restricted between 0 to 1 or -1 to 1). So, skip the activation function and keep mse or mean_squared_error as your loss function.
EDIT:
model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Dense(3,input_shape=(1,),activation='relu'))
model.add(tf.keras.layers.Dense(1))
You are defining your problem as a regression problem where the result of model.predict is a linear value. For that kind of situation the last layer in your model is a linear layer that does not have an activation function. For this kind of problem your loss as mse is fine. Now you could elect to define your problem as a classification problem. Where you have 3 classes, Non-Smoker, Passive-Smoker and Light smoker. Now in that case, your target data in training is not a number in the numerical sense but an integer that indicates which class the training sample represents. For example you could have Non_Smoker with the label 0, Passive_Smoker with the label 1 and Light_Smoker with the label 2. Now the last layer in your model would use a softmax activation function. In model.compile your loss would be sparse_categorical_crossentropy because your labels are integers. If you one-hot encode your labels, for example Non_Smoker coded as 100, Light_Smoker as 010 and Passive_Smoker coded as 001 then your loss fuction would be categorical_cross_entropy. Now when you ran model.predict on a test sample it will produce a list containing 3 probabilities. The first in the list is the probability for class 0 - Non_Smoker, second is the probability for class 1 Light Smoker and the third is the probability of the third class Passive_Smoker. Now what you do is use np.argmax to find which index has the highest probability value and that is then the model's prediction.
I want to setup a keras model (tensorflow backend) for a multiclassification problem with 4 different classes. I have both labeled and unlabeled data.
I have worked out the case in which I only train with the labeled data and my model looks something like this:
# create model
inputs = keras.Input(shape=(len(config.variables), ))
X = layers.Dense(units=200, activation="relu")(inputs)
output = layers.Dense(units=4, activation="softmax", name="output")(X)
model = keras.Model(inputs=inputs, outputs=output)
model.compile(optimizer=optimizers.Adam(1e-4), loss=loss_function, metrics=["accuracy"])
# train model
model.fit(
x=train_data,
y=train_class_labels,
batch_size=200,
epochs=200,
verbose=2,
validation_split=0.2,
sample_weight = class_weights
)
I have functioning models with to different losses namely categorical_crossentropy and sparse_categorical_crossentropy, and depending on the loss function my train_class_labels where in one-hot representation (e.g. [ [0,1,0,0], [0,0,0,1], ...]) or in the integer representation (e.g. [0,0,2,1,0,3, ...]) and everything worked fine. class_weights is some weight vector ([0.78, 1,34, ...])
Now for my further plans I need to include the unlabeled data in the training process but I need it to be ignored by the loss function.
What I have tried:
setting the labels from the unlabeled data to [0,0,0,0] when using categorical_crossentropy as a loss, because i thought then my unlabeled data would be ignored by the loss function. Somehow this changed the predictions after training.
I also tried setting the weights from the unlabeled data to 0 but that did have an effect either
I concluded that I need to somehow mark me unlabeled data and customize my loss function so that it can be told to ignore those samples. Something like
def custom_loss(y_true, y_pred):
if y_true == labeled data:
return normal loss function
if y_true == unlabeled data:
return 0
Those are some snippets that I have found but they do not seem to work:
def custom_loss(y_true, y_pred):
loss = losses.sparse_categorical_crossentropy(y_true, y_pred)
return K.switch(K.flatten(K.equal(y_true, -1)), K.zeros_like(loss), loss)
def custom_loss2(y_true, y_pred):
idx = tf.not_equal(y_true, -1)
y_true = tf.boolean_mask(y_true, idx)
y_pred = tf.boolean_mask(y_pred, idx)
return losses.sparse_categorical_crossentropy(y_true, y_pred)
In those examples I set the labels from the unlabeled data to -1 so train_class_labels would look something like this: [0,-1,2,0,3, ... ]
But when using the first loss function I just get Nans and when using the second one I get the following error:
Invalid argument: logits and labels must have the same first dimension, got logits shape [1,5000] and labels shape [5000]
I think that setting the labels to [0,0,0,0] would be just fine. Because the loss is calculated by sum of the log losses of your instances per class (in your case the loss would be 0 for instances with no label).
I don't understand why you are inserting non labeled data in your training in a supervised setting.
I think that the differences that you obtain are due to the batch size and to the gradient step. If there are instances that do not contribute to the gradient descent, the loss calculated would be different than before, and then you get the difference in prediction.
Basically there would be less informative instances per batch.
If you use as batch size the size of all the dataset there would be no difference from a previous training without the unlabeled instances (but always with a training with batch size = size of the dataset)
I am new to Python and have been performing text classification with tensorflow. I would like to know if this text classification model could be updated with every new data that I might acquire in future so that I would not have to train the model from scratch. Also, sometimes with time, the number of classes might also be more since I am mostly dealing with customer data. Is it possible to update this existing text classification model with data containing more number of classes by using the existing checkpoints?
Given that you are asking 2 different question I'm now answering both separately:
1) Yes, you can continue the training with the new data you have acquired. This is very simple, you just need to restore your model as you do now to use it. Instead of running some placeholder like outputs, or prediction, you should run the optimizer operation.
This translates into the following code:
model = build_model() # this is the function that build the model graph
saver = tf.train.Saver()
with tf.Session() as session:
saver.restore(session, "/path/to/model.ckpt")
########### keep training #########
data_x, data_y = load_new_data(new_data_path)
for epoch in range(1, epochs+1):
all_losses = list()
num_batches = 0
for b_x, b_y in batchify(data_x, data_y)
_, loss = session.run([model.opt, model.loss], feed_dict={model.input:b_x, model.input_y : b_y}
all_losses.append(loss * len(batch_x))
num_batches += 1
print("epoch %d - loss: %2f" % (epoch, sum(losses) / num_batches))
note that you need to now the name of the operations defined by the model in order to run the optimizer (model.opt) and the loss op (model.loss) to train and monitor the loss during training.
2) If you want to change the number of labels you want to use then it is a bit more complicated. If your network is 1 layer feed forward then there is not much to do, because you need to change the matrix dimensionality then you need to retrain everything from scratch. On the other hand, if you have for example a multi-layer network (e.g. an LSTM + dense layer that do the classification) then you can restore the weights of the old model and just train from scratch the last layer. To do that i recommend you to read this answer https://stackoverflow.com/a/41642426/4186749
I am training a deep neural network multi-class classifier using TensorFlow. The network outputs the linear values from the final layer, which the tf.nn.softmax_cross_entropy_with_logits cost function takes as input. However, I don't really care about that linear output per se - I want to know what it looks like when the softmax function is applied to it.
Below the relevant parts of my code:
def train_network(x, num_hidden_layers):
prediction = neural_network_model(x, num_hidden_layers)
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=prediction, labels=y))
optimizer = tf.train.AdamOptimizer(learning_rate=0.01).minimize(cost)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
# train the network
...
# get the network output; x_test is my test data (len=663)
output = sess.run(prediction,feed_dict={x: x_test})
# get softmax values of output
for i in range(len(x_test)):
softm = sess.run(tf.nn.softmax(output[i]))
pred_class = sess.run(tf.argmax(softm))
print(pred_class)
...
Now, that final for-loop in which I calculate the softmax values is extremely slow. Why is that, and how do I do this properly?