I recently trying to build a program, that classify Quora (question pair) dataset, whether it's duplicate or not. I got the accuracy and loss based on real y, but IDK how to proceed the output (predicted y) can anyone help me?
the output shud be 1 or 0 (binary class)
This is the sentence merger code, training process use LSTM
merged = RNN(EMBED_HIDDEN_SIZE)(merged)
merged = layers.Dropout(dropoutp)(merged)
preds = layers.Dense(answer_size, activation='sigmoid')(merged)
model = Model([questiona, questionb], preds)
rmsprop = keras.optimizers.rmsprop(lr=lrn)
model.summary()
You can get the predictions by passing the test data to predict funtion
predictions=model.predict(X)
Link to the docs
Related
I have a keras model here which looks as follows:
As you can see, an intent (four classes) is predicted and each word of the sentence is tagged (choce of 10 classes). I'm now struggeling with the model.fit and the y_train data preparation. If I shape it as follows, all works, but it doesn't feel correct as the left output will have the same shape as the right output.
x = np.array(df_ic.message)
y = np.zeros((df_ic.message.size,2,85))
Can anyone help/suggest how to best shape the train data, i.e. y?
Thanks a lot,
Martin
You can create a Keras model with 2 outputs and provide (y1, y2) to it.
As an example, please see https://github.com/ageron/handson-ml3/blob/main/10_neural_nets_with_keras.ipynb
Search for "Adding an auxiliary output for regularization":
"...
model = tf.keras.Model(inputs=[input_wide, input_deep],
outputs=[output, aux_output])
...
history = model.fit(
(X_train_wide, X_train_deep), (y_train, y_train), epochs=20,
validation_data=((X_valid_wide, X_valid_deep), (y_valid, y_valid))
)
..."
This is explained in detail in book "Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, 3rd Edition" by Aurélien Géron, see pages 333-335.
While I'm able to understand how to use model.fit(x_train, y_train), I can't figure out how to make predictions on new data using tensorflow's gradient tape. My github repository with runnable code (up to an error) can be found here. What is currently working is that I get the trained model "network_output", however it appears that with gradient tape, argmax is being used on the model itself, where I'm used to model.fit() taking the test data as an input:
network_output = trained_network(input_images,input_number)
preds = np.argmax(network_output, axis=1)
Where "input_images" is an ndarray: (20,3,3,1) and "input_number" is an ndarray: (20,5).
Now I'm taking network_output as the trained model and would like to use it to predict similarly typed data of test_images, and test_number respectively.
The error 'tensorflow.python.framework.ops.EagerTensor' object has no attribute 'predict' here:
predicted_number = network_output.predict(test_images)
Which is because I don't know how to use the tape to make predictions. However once the prediction works I would guess I can compare the resulting "predicted_number" against the "test_number" as would usually be done using the model.fit method.
acc = 0
for i in range(len(test_images)):
if (predicted_number[i] == test_number[i]):
acc += 1
print("Accuracy: ", acc / len(input_images) * 100, "%")
In order to obtain prediction I usually iterate through batches manually like this:
predictions = []
for batch in range(num_batch):
logits = trained_network(x_test[batch * batch_size: (batch + 1) * batch_size], training=False)
# first obtain probabilities
# (if the last layer of the network has no activation, otherwise skip the softmax here)
prob = tf.nn.softmax(logits)
# putting back together predictions for all batches
predictions.extend(tf.argmax(input=prob, axis=1))
If you don't have a lot of data you can skip the loop, this is faster than using predict because you directly invoke the __call__ method of the model:
logits = trained_network(x_test, training=False)
prob = tf.nn.softmax(logits)
predictions = tf.argmax(input=prob, axis=1)
Finally you could also use predict. In this case the batches are handled automatically. It is easier to use when you have lots of data since you don't have to create a loop to interate through batches. The result is a numpy array of predictions. In can be used like this:
predictions = trained_network.predict(x_test) # you can set a batch_size if you want
What you're doing wrong is this part:
network_output = trained_network(input_images,input_number)
predicted_number = network_output.predict(test_images)
You have to call predict directly on your model trained_network.
I'm beginner learning to build a standard transformer model based on PyTorch to solve an univariate sequence-to-sequence regression problem. The codes are written referring to the tutorial of PyTorch, but it turns out the training/validation error is quite different from the testing error.
During the training, it goes like:
for src, tgt in train_loader:
optimizer.zero_grad()
output = net(src=src, tgt=tgt, device=device)
loss = criterion(output[:,:-1,:], tgt[:,1:,:]) #is this correct?
loss.backward()
optimizer.step()
where the target sequence tgt is prefixed with a fixed number (0.1) to mimic the SOS token, and the output sequence output is shifted as well to mimic the EOS token. The transformer net is trained with the triangular target mask to mimic the auto-regression during the inference when the targer sequence is not available.
During the training, it goes like:
with torch.no_grad():
for src, tgt in test_loader:
net.eval()
outputs = torch.zeros(tgt.size())
temp = (torch.rand(tgt.size())*2-1)
temp[:,0,:] = 0.1*torch.ones(tgt[:,0,:].size()) #prefix to mimic SOS
for t in range(1, temp.size()[1]):
outputs = net(src=src, tgt=temp, device=device)
temp[:,t,:] = outputs[:,t-1,:] #is this correct?
outputs = net(src, temp, device=device) #is this correct?
print(criterion(outputs[:,:-1,:], tgt[:,1:,:]))
During the training, the training loss and validation loss (based on MSE) drop and converge smoothly. However, the testing loss turns out to be much larger than the aforementioned. Could anyone check it out if this is the correct way to do the inference of transformer model?
(Btw, I couldn't find many examples for univariate sequence regression transformer models on Google, any recommended links will be really appreciated!)
Hi I'm trying to build a simple neural network with tensorflow, where I give the model the training_data, which contains the standard values and i give it the target_data, which is the result I want it to have if the predicted value is near one of those numbers.
For example, if I give the y_test a value of 3.5, the model would predict and give a number close to 4. So the condition would say it was a lightsmoker. I searched a bit for activation functions and I learned I can't use sigmoid for what I want to do. I'm quite new on this matter. What i've done so far it's by error and trial.
import random
import tensorflow as tf
import numpy as np
training_data=[]
for i in range(0,5):
training_data.append([random.uniform(0,0.2944)])
for i in range(0,5):
training_data.append([random.uniform(0.2944,1.7394)])
for i in range(0,5):
training_data.append([random.uniform(1.7394,3.2394)])
for i in range(0,5):
training_data.append([random.uniform(3.2394,6)])
target_data=[]
for i in range(0,5):
target_data.append([1])
for i in range(0,5):
target_data.append([2])
for i in range(0,5):
target_data.append([3])
for i in range(0,5):
target_data.append([4])
y_test= np.array([100])
model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Dense(len(target_data),input_dim=1,activation='softmax'))
model.add(tf.keras.layers.Dense(1,activation='relu'))
model.compile( loss='mean_squared_error',
optimizer='adam',
metrics=['accuracy'])
training_data = np.asarray(training_data)
target_data = np.asarray(target_data)
model.fit(training_data, target_data, epochs=50, verbose=0)
target_pred= model.predict(y_test)
target_pred=float(target_pred)
print("X=%s, Predicted=%s" % (y_test, target_pred))
if( 0<= target_pred <= 1.5):
print("\nNon-Smoker")
elif(1.5<= target_pred <2.5):
print("\nPassive Smoker")
elif(2.5<= target_pred <3.5 ):
print("Lghtsmoker")
else:
print("Smoker\n")
Here is a helpful guide to using activation functions in the final layer as well as corresponding losses for different type of problems.
In your case, I am assuming you are working with a regression task with arbitrary values (any float value as output, not restricted between 0 to 1 or -1 to 1). So, skip the activation function and keep mse or mean_squared_error as your loss function.
EDIT:
model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Dense(3,input_shape=(1,),activation='relu'))
model.add(tf.keras.layers.Dense(1))
You are defining your problem as a regression problem where the result of model.predict is a linear value. For that kind of situation the last layer in your model is a linear layer that does not have an activation function. For this kind of problem your loss as mse is fine. Now you could elect to define your problem as a classification problem. Where you have 3 classes, Non-Smoker, Passive-Smoker and Light smoker. Now in that case, your target data in training is not a number in the numerical sense but an integer that indicates which class the training sample represents. For example you could have Non_Smoker with the label 0, Passive_Smoker with the label 1 and Light_Smoker with the label 2. Now the last layer in your model would use a softmax activation function. In model.compile your loss would be sparse_categorical_crossentropy because your labels are integers. If you one-hot encode your labels, for example Non_Smoker coded as 100, Light_Smoker as 010 and Passive_Smoker coded as 001 then your loss fuction would be categorical_cross_entropy. Now when you ran model.predict on a test sample it will produce a list containing 3 probabilities. The first in the list is the probability for class 0 - Non_Smoker, second is the probability for class 1 Light Smoker and the third is the probability of the third class Passive_Smoker. Now what you do is use np.argmax to find which index has the highest probability value and that is then the model's prediction.
I am trying to get predictions from my sentiment analysis models that classify 500 worded News articles. The models validation loss and training loss is in are about the same and their scores are relatively high. However when I try to make predictions with them I get the same classification result in all of them regardless of the text input.
I believe that the problem might be on the way I am trying to make a prediction (I pad my string with spaced characters). I was hoping that someone here could shed some light on this issue (my code below). Thank you for your help
comment = 'SAMPLE TEXT STRING'
for i in range(300-len(comment.split(' '))):
apad += ' A'
comment = comment + apad
tok.fit_on_texts([comment])
X = tokenizer.texts_to_sequences([comment])
X = preprocessing.sequence.pad_sequences(X)
yhat = b.predict_classes(X)
print(yhat)
prediction = b.predict(X, batch_size=None, verbose=0, steps=None)
print(prediction)
The output of this script is below. Both prediction and predicted classes, are regardless of the text input always 0 for some reason:
[[0]] [[0.00645966]]
The problem seems to be with the tokenizer.
You can't fit the tokenizer again, because you will have different tokens for each word. You should fit the tokenizer only once before training and then save the tokens to be used with all new text.