Is it possible to update existing text classification model in tensorflow? - python

I am new to Python and have been performing text classification with tensorflow. I would like to know if this text classification model could be updated with every new data that I might acquire in future so that I would not have to train the model from scratch. Also, sometimes with time, the number of classes might also be more since I am mostly dealing with customer data. Is it possible to update this existing text classification model with data containing more number of classes by using the existing checkpoints?

Given that you are asking 2 different question I'm now answering both separately:
1) Yes, you can continue the training with the new data you have acquired. This is very simple, you just need to restore your model as you do now to use it. Instead of running some placeholder like outputs, or prediction, you should run the optimizer operation.
This translates into the following code:
model = build_model() # this is the function that build the model graph
saver = tf.train.Saver()
with tf.Session() as session:
saver.restore(session, "/path/to/model.ckpt")
########### keep training #########
data_x, data_y = load_new_data(new_data_path)
for epoch in range(1, epochs+1):
all_losses = list()
num_batches = 0
for b_x, b_y in batchify(data_x, data_y)
_, loss = session.run([model.opt, model.loss], feed_dict={model.input:b_x, model.input_y : b_y}
all_losses.append(loss * len(batch_x))
num_batches += 1
print("epoch %d - loss: %2f" % (epoch, sum(losses) / num_batches))
note that you need to now the name of the operations defined by the model in order to run the optimizer (model.opt) and the loss op (model.loss) to train and monitor the loss during training.
2) If you want to change the number of labels you want to use then it is a bit more complicated. If your network is 1 layer feed forward then there is not much to do, because you need to change the matrix dimensionality then you need to retrain everything from scratch. On the other hand, if you have for example a multi-layer network (e.g. an LSTM + dense layer that do the classification) then you can restore the weights of the old model and just train from scratch the last layer. To do that i recommend you to read this answer https://stackoverflow.com/a/41642426/4186749

Related

A Classifier Network Seems to be "Forgetting" older samples

This is a strange problem: Imagine a neural network classifier. It is a simple linear layer followed by a sigmoid activation that has an input size of 64, and an output size of 112. There also are 112 training samples, where I expect the output to be a one-hot vector. So the basic structure of a training loop is as follows, where samples is a list of integer indices:
model = nn.Sequential(nn.Linear(64,112),nn.Sequential())
loss_fn = nn.BCELoss()
optimizer = optim.AdamW(model.parameters(),lr=3e-4)
for epoch in range(500):
for input_state, index in samples:
one_hot = torch.zeros(112).float()
one_hot[index] = 1.0
optimizer.zero_grad()
prediction = model(input_state)
loss = loss_fn(prediction,one_hot)
loss.backward()
optimizer.step()
This model does not perform well, but I don't think it's a problem with the model itself, but rather how it's trained. I think that this is happening because for the most part, all of the one_hot tensor is zeros, that the model just tends to gravitate toward all of the outputs being zeros, which is what's happening. The question becomes: "How does this get solved?" I tried using the average loss with all the samples, to no avail. So what do I do?
So this is very embarrassing, but the answer actually lies in how I process my data. This is a text-input project, so I used basic python lists to create blocks of messages, but when I did this, I ended up making it so that all of the inputs the net got were the same, but the output was different every time. I solved tho s problem with the copy method.

How to generate predictions from new data using trained tensorflow network?

I want to train Googles VGGish network (Hershey et al 2017) from scratch to predict classes specific to my own audio files.
For this I am using the vggish_train_demo.py script available on their github repo which uses tensorflow. I've been able to modify the script to extract melspec features from my own audio by changing the _get_examples_batch() function, and, then train the model on the output of this function. This runs to completetion and prints the loss at each epoch.
However, I've been unable to figure out how to get this trained model to generate predictions from new data. Can this be done with changes to the vggish_train_demo.py script?
For anyone who stumbles across this in the future, I wrote this script which does the job. You must save logmel specs for train and test data in the arrays: X_train, y_train, X_test, y_test. The X_train/test are arrays of the (n, 96,64) features and the y_train/test are arrays of shape (n, _NUM_CLASSES) for two classes, where n = the number of 0.96s audio segments and _NUM_CLASSES = the number of classes used.
See the function definition statement for more info and the vggish github in my original post:
### Run the network and save the predictions and accuracy at each epoch
### Train NN, output results
r"""This uses the VGGish model definition within a larger model which adds two
layers on top, and then trains this larger model.
We input log-mel spectrograms (X_train) calculated above with associated labels
(y_train), and feed the batches into the model. Once the model is trained, it
is then executed on the test log-mel spectrograms (X_test), and the accuracy is
ouput, alongside a .csv file with the predictions for each 0.96s chunk and their
true class."""
def main(X):
with tf.Graph().as_default(), tf.Session() as sess:
# Define VGGish.
embeddings = vggish_slim.define_vggish_slim(training=FLAGS.train_vggish)
# Define a shallow classification model and associated training ops on top
# of VGGish.
with tf.variable_scope('mymodel'):
# Add a fully connected layer with 100 units. Add an activation function
# to the embeddings since they are pre-activation.
num_units = 100
fc = slim.fully_connected(tf.nn.relu(embeddings), num_units)
# Add a classifier layer at the end, consisting of parallel logistic
# classifiers, one per class. This allows for multi-class tasks.
logits = slim.fully_connected(
fc, _NUM_CLASSES, activation_fn=None, scope='logits')
tf.sigmoid(logits, name='prediction')
linear_out= slim.fully_connected(
fc, _NUM_CLASSES, activation_fn=None, scope='linear_out')
logits = tf.sigmoid(linear_out, name='logits')
# Add training ops.
with tf.variable_scope('train'):
global_step = tf.train.create_global_step()
# Labels are assumed to be fed as a batch multi-hot vectors, with
# a 1 in the position of each positive class label, and 0 elsewhere.
labels_input = tf.placeholder(
tf.float32, shape=(None, _NUM_CLASSES), name='labels')
# Cross-entropy label loss.
xent = tf.nn.sigmoid_cross_entropy_with_logits(
logits=logits, labels=labels_input, name='xent')
loss = tf.reduce_mean(xent, name='loss_op')
tf.summary.scalar('loss', loss)
# We use the same optimizer and hyperparameters as used to train VGGish.
optimizer = tf.train.AdamOptimizer(
learning_rate=vggish_params.LEARNING_RATE,
epsilon=vggish_params.ADAM_EPSILON)
train_op = optimizer.minimize(loss, global_step=global_step)
# Initialize all variables in the model, and then load the pre-trained
# VGGish checkpoint.
sess.run(tf.global_variables_initializer())
vggish_slim.load_vggish_slim_checkpoint(sess, FLAGS.checkpoint)
# The training loop.
features_input = sess.graph.get_tensor_by_name(
vggish_params.INPUT_TENSOR_NAME)
accuracy_scores = []
for epoch in range(num_epochs):#FLAGS.num_batches):
epoch_loss = 0
i=0
while i < len(X_train):
start = i
end = i+batch_size
batch_x = np.array(X_train[start:end])
batch_y = np.array(y_train[start:end])
_, c = sess.run([train_op, loss], feed_dict={features_input: batch_x, labels_input: batch_y})
epoch_loss += c
i+=batch_size
#print no. of epochs and loss
print('Epoch', epoch+1, 'completed out of', num_epochs,', loss:',epoch_loss) #FLAGS.num_batches,', loss:',epoch_loss)
#If these lines are left here, it will evaluate on the test data every iteration and print accuracy
#note this adds a small computational cost
correct = tf.equal(tf.argmax(logits, 1), tf.argmax(labels_input, 1)) #This line returns the max value of each array, which we want to be the same (think the prediction/logits is value given to each class with the highest value being the best match)
accuracy = tf.reduce_mean(tf.cast(correct, 'float')) #changes correct to type: float
accuracy1 = accuracy.eval({features_input:X_test, labels_input:y_test})
accuracy_scores.append(accuracy1)
print('Accuracy:', accuracy1)#TF is smart so just knows to feed it through the model without us seeming to tell it to.
#Save predictions for test data
predictions_sigm = logits.eval(feed_dict = {features_input:X_test}) #not really _sigm, change back later
#print(predictions_sigm) #shows table of predictions, meaningless if saving at each epoch
test_preds = pd.DataFrame(predictions_sigm, columns = col_names) #converts predictions to df
true_class = np.argmax(y_test, axis = 1) #This saves the true class
test_preds['True class'] = true_class #This adds true class to the df
#Saves csv file of table of predictions for test data. NB. header will not save when using np.text for some reason
np.savetxt("/content/drive/MyDrive/..."+"Epoch_"+str(epoch+1)+"_Accuracy_"+str(accuracy1), test_preds.values, delimiter=",")
if __name__ == '__main__':
tf.app.run()
#'An exception has occurred, use %tb to see the full traceback.' error will occur, fear not, this just means its finished (perhaps as its exited the tensorflow session?)

Training and testing CNN with pytorch. With and without model.eval()

I have two questions:-
I am trying to train a convolution neural network initialized with some pre trained weights (Netwrok contains batch normalization layers as well) (taking reference from here). Before training I want to calculate a validation error using loss_fn = torch.nn.MSELoss().cuda().
And in the reference, the author is using model.eval() before calculating the validation error. But with that result, the CNN model is off from what it should be however when I comment out model.eval(), the output is good (what it should be with pre-trained weights). What could be reason behind it as I have read on many posts that model.eval should be used before testing the model and model.train() before training it.
While calculating the validation error with pre-trained weights and above mentioned loss function what should be the batch size. Shouldn't it be 1 as i want output on each of my input, calculate error with ground truth and in the end take average of all results. If i use higher batch size error is increased. So question is can i use higher batch size if yes what should be the right way. In given code i have given err = float(loss_local) / num_samples but i observed without averaging i.e err = float(loss_local). Error is different for different batch size. I am doing this without model.eval right now.
batch_size = 1
data_path = 'path_to_data'
dtype = torch.FloatTensor
weight_file = 'path_to_weight_file'
val_loader = torch.utils.data.DataLoader(NyuDepthLoader(data_path, val_lists),batch_size=batch_size, shuffle=True, drop_last=True)
model = Model(batch_size)
model.load_state_dict(load_weights(model, weight_file, dtype))
loss_fn = torch.nn.MSELoss().cuda()
# model.eval()
with torch.no_grad():
for input, depth in val_loader:
input_var = Variable(input.type(dtype))
depth_var = Variable(depth.type(dtype))
output = model(input_var)
input_rgb_image = input_var[0].data.permute(1, 2, 0).cpu().numpy().astype(np.uint8)
input_gt_depth_image = depth_var[0][0].data.cpu().numpy().astype(np.float32)
pred_depth_image = output[0].data.squeeze().cpu().numpy().astype(np.float32)
print (format(type(depth_var)))
pred_depth_image_resize = cv2.resize(pred_depth_image, dsize=(608, 456), interpolation=cv2.INTER_LINEAR)
target_depth_transform = transforms.Compose([flow_transforms.ArrayToTensor()])
pred_depth_image_tensor = target_depth_transform(pred_depth_image_resize)
#both inputs to loss_fn are 'torch.Tensor'
loss_local += loss_fn(pred_depth_image_tensor, depth_var)
num_samples += 1
print ('num_samples {}'.format(num_samples))
err = float(loss_local) / num_samples
print('val_error before train:', err)
What could be reason behind it as I have read on many posts that model.eval should be used before testing the model and model.train() before training it.
Note: testing the model is called inference.
As explained in the official documentation:
Remember that you must call model.eval() to set dropout and batch normalization layers to evaluation mode before running inference. Failing to do this will yield inconsistent inference results.
So this code must be present once you load the model from a file and do inference.
# Model class must be defined somewhere
model = torch.load(PATH)
model.eval()
This is because dropout works as a regularization for preventing overfitting during training, it is not needed for inference. Same for the batch norms.
When you use eval() this just sets module train label to False and affects only certain types of modules in particular Dropout and BatchNorm.

Get top-k predictions from tensorflow

I am relatively new in machine learning especially when it comes to implementing algorithms. I am using python and tensorflow library to implement a neural network to train on a dataset which has about 20 classes. I am able to train and get predictions successfully but I have a question,
Is it possible to get top k classes along with their probabilities using tensorflow instead of just a single prediction?
If it is possible how can this be done? Thanks for your guidance.
Update 01:
I am adding code of what I am doing. So I build a neural network with 3 layers having tanh, sigmoid, & sigmoid respectively as activation functions for the hidden layers and softmax for output layer. The code for training and prediction is as follows:
y_pred = None
with tf.Session() as sess:
sess.run(init)
for epoch in range(training_epochs):
# running the training_epoch numbered epoch
_,cost = sess.run([optimizer,cost_function],feed_dict={X:tr_features,Y:tr_labels})
cost_history = np.append(cost_history,cost)
# predict results based on the trained model
y_pred = sess.run(tf.argmax(y_,1),feed_dict={X: ts_features})
Right now y_pred is a list of class labels for each test example of ts_features. But instead of getting 1 single class label for each test example I am hoping to get top-k predictions for each example each of the k-predictions accompanied by some kind of probability.
Using tf.nn.top_k():
top_k_values, top_k_indices = tf.nn.top_k(predictions, k=k)
If predictions is a vector of probabilities per class (i.e. predictions[i] = prediction probability for class i), then top_k_values will contain the k highest probabilities in predictions, and top_k_indices will contain the indices of these probabilities, i.e. the corresponding classes.
Supposing that in your code, y_ is the vector of predicted probabilities per class:
k = 3 # replace with your value
# Instead of `y_pred`:
y_k_probs, y_k_pred = sess.run(
tf.nn.top_k(y_, k=k), feed_dict={X: ts_features})

Designing an Efficient neural network for a Regression data [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
By Following the tutorials on Tensorflow and by reading some basic stuff about neural networks, I have modeled a neural network using python and Tensorflow libraries.
As of now,my ".csv" file data is as follows:
AT V AP RH PE
14.96 41.76 1024.07 73.17 463.26
25.18 62.96 1020.04 59.08 444.37
5.11 39.4 1012.16 92.14 488.56
20.86 57.32 1010.24 76.64 446.48
10.82 37.5 1009.23 96.62 473.9
26.27 59.44 1012.23 58.77 443.67
15.89 43.96 1014.02 75.24 467.35
9.48 44.71 1019.12 66.43 478.42
14.64 45 1021.78 41.25 475.98
.....................................
As of now,I have designed my neural network to function for multiple inputs and multiple outputs. In the above data,I am considering first three columns as my inputs and next 2 columns as my outputs.So,once I am training the data,If I pass the inputs 14.64,45,1021.78 ,I want my neural network to predict the output values 41.25 and 475.98.
Here is my current code:
import tensorflow as tf
import numpy as np
import pandas as pd
#import matplotlib.pyplot as plt
rng = np.random
# Parameters
learning_rate = 0.01
training_epochs = 5000
display_step = 1000
batch_size = 100
# Read data from CSV
df = pd.read_csv("H:\MiniThessis\Sample.csv")
# In[173]:
# Seperating out dependent & independent variable
train_x = df[['AT','V','AP']]
train_y = df[['RH','PE']]
trainx = train_x.as_matrix().astype(np.float32)
trainy = train_y.as_matrix().astype(np.float32)
n_input = 3
n_classes = 2
n_hidden_1 = 20
n_hidden_2 = 20
n_samples = len(trainx)
# tf Graph Input
#Inserts a placeholder for a tensor that will be always fed.
x = tf.placeholder(tf.float32, [None, n_input])
y = tf.placeholder(tf.float32, [None, n_classes])
# Set model weights
W_h1 = tf.Variable(tf.random_normal([n_input, n_hidden_1]))
W_h2 = tf.Variable(tf.random_normal([n_hidden_1, n_hidden_2]))
W_out = tf.Variable(tf.random_normal([n_hidden_2, n_classes]))
b_h1 = tf.Variable(tf.zeros([n_hidden_1]))
b_h2 = tf.Variable(tf.zeros([n_hidden_2]))
b_out = tf.Variable(tf.zeros([n_classes]))
# In[175]:
# Construct a linear model
layer_1 = tf.add(tf.matmul(x, W_h1), b_h1)
layer_1 = tf.nn.relu(layer_1)
layer_2 = tf.add(tf.matmul(layer_1, W_h2), b_h2)
layer_2 = tf.nn.relu(layer_2)
out_layer = tf.matmul(layer_2, W_out) + b_out
# In[176]:
# Mean squared error
#cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=out_layer, labels=y))
cost = tf.reduce_mean(tf.pow(out_layer-y, 2))/(2*n_samples)
#cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=out_layer, labels=y))
# Gradient descent
optimizer = tf.train.AdamOptimizer(learning_rate).minimize(cost)
# In[177]:
# Initializing the variables
init = tf.global_variables_initializer()
# In[181]:
initialval = 0
finalval = 100
batchcount = int(n_samples/100)
remainder = n_samples%100
print(remainder)
if remainder!=0:
batchcount = batchcount +1
# Launch the graph
with tf.Session() as sess:
sess.run(init)
# Fit all training data
for epoch in range(training_epochs):
avg_cost = 0.
for batchIdx in range(batchcount):
if remainder != 0 and batchIdx==batchcount-1:
finalval = finalval-(100-remainder)
subtrainx = trainx[initialval:finalval]
subtrainy = trainy[initialval:finalval]
_, c = sess.run([optimizer, cost], feed_dict={x: subtrainx,y: subtrainy})
initialval = initialval+100
finalval = finalval+100
avg_cost += c/batchcount
# Display logs per epoch step
if epoch % display_step == 0:
print("Epoch:", '%04d' % (epoch+1), "cost=", \
"{:.9f}".format(avg_cost))
#print("Optimization Finished!")
#training_cost = sess.run(cost, feed_dict={x: trainx,y: trainy})
#print(training_cost)
best = sess.run([out_layer], feed_dict={x: np.array([[14.96,41.76,1024.07]])})
print(best)
The architecture of my neural network is as follows:
1) No. of input nodes(n_inputs) and output(n_classes) nodes are 3 and 2 respectively
2)As of now,I am considering two hidden layers ,each having 20 nodes
I need help regarding in following points:
1) How do I select the parameters "training_epoch" ; "learning_rate" ; "batch_size" here,so that I can have a better accuracy?
2)I am still not sure about the architecture of my neural network. How many hidden layers is it recommended to use? and also, the number of nodes in each hidden layer?
3)If suppose,in my data,I want to use first two columns as my inputs and next three columns as outputs,then what are the changes,I can make? Do I also need to change my complete architecture?
4)Also,I am not sure about my cost function.Which one is better to use for better accuracy?
5) Also,please let me know,if I am missing some important parameter,which is worth considering!
Thanks in advance!
The questions you are asking are quite general, and if a universal answer to those would be discovered, then the data scientists would cease to exist, since the process of machine learning system construction would then be automated.
1,2,4. The learning rate and epoch count are correspondingly: "the less the better" and "the more the better". In practice however, you also need your machine to finish education before the heat death of universe sets in.
Usually, you look at the values of the error function that is your network score on the dataset you train it on. At first network is learning from data and score gets better with each epoch. After a while, the network learns all it could from the dataset and score stops improving. After that there's no point of continuing. You may also quit earlier if the error is small enough for your uses.
The learning rate can also be chosen on the basis of the evolution of the error. The higher the rate is, the faster the network will finish learning, but if it is too high, the network can overshoot the optimal value of the parameters an get worse and worse over time. So you need to find a compromise between the speed of learning and the accuracy. It is not uncommon to reduce the learning rate as the epoch counter increases.
You've also used the word "accuracy" but you need to first define what it means in terms of your problem. (It is tightly related with the cost function, actually.)
The problem you've described falls into the class of linear\nonlinear regression (as opposed to logistic regression), for these problems the mean square error (which is used in your code) is customary. Those cross-entropy criteria are more suited for classification problems, usually. If you have some domain expertise about your data you of course may devise specialized cost functions for your network.
There is no single answer about the best structure of the network. It's complexity depends on the function you are trying to approximate. If the connection between input and output data is linear, then you don't need anything more than a single neuron.
Generally, the task of deciding on the structure of a network is an optimization problem itself. So, you first set aside some test data with answers, then train your network on the rest of data, then see how well it performs on the test data, which it has not seen before. You test network, you modify it, train it, and test again. Based on the performance of the network on both training and test sets you decide how to alter it next.
3. You will need to change the number of inputs and outputs. And you will have to train the network anew.
Also, these and many more basic questions are covered in many courses on the internet, consider taking those. One of such courses can be found here.

Categories

Resources