I'm building a chacter-based rnn model using Keras (Theano backend). One thing to note is that I don't want to use a prebuilt loss function. Instead, I want to calculate loss for some datapoints. Here's what I mean.
Vectoried training set and its label look like this:
X_train = np.array([[0,1,2,3,4]])
y_train = np.array([[1,2,3,4,5]])
But I replaced first k element in the y_train with 0 for some reason. So, for example, new y_train is
y_train = np.array([[0,0,3,4,5]])
The reason why I set the first two elements to 0 is I don't want to include them when computing loss. In other words, I want to calculate the loss between X_train[2:] and y_train[2:].
Here's my try.
import numpy as np
np.random.seed(0) # for reproducibility
from keras.preprocessing import sequence
from keras.utils import np_utils
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Embedding
from keras.layers import LSTM
from keras.layers.wrappers import TimeDistributed
X_train = np.array([[0,1,2,3,4]])
y_train = np.array([[0,0,3,4,5]])
y_3d = np.zeros((y_train.shape[0], y_train.shape[1], 6))
for i in range(y_train.shape[0]):
for j in range(y_train.shape[1]):
y_3d[i, j, y_train[i,j]] = 1
model = Sequential()
model.add(Embedding(6, 5, input_length=5, dropout=0.2))
model.add(LSTM(5, input_shape=(5, 12), return_sequences=True) )
model.add(TimeDistributed(Dense(6))) #output classes =6
model.add(Activation('softmax'))
from keras import backend as K
import theano.tensor as T
def custom_objective(y_true,y_pred):
# Find the last index of minimum value in y_true, axis=-1
# For example, y_train = np.array([[0,0,3,4,5]]) in my example, and
# I'd like to calculate the loss only between X_train[3:] and y_train[3:] because the values
# in y_train[:3] (i.e.0) are dummies. The following is pseudo code if y_true is 1-d numpy array, which is not true.
def rindex(y_true):
for i in range(len(y_true), -1, -1):
if y_true(i) == 0:
return i
starting_point = rindex(y_true)
return K.categorical_crossentropy(y_pred[starting_point:], y_true[starting_point:])
model.compile(loss=custom_objective,
optimizer='adam',
metrics=['accuracy'])
model.fit(X_train, y_t, batch_size=batch_size, nb_epoch=1)
Appart from minor errors like the wrong paranthesis in line 35 and a wrong variable name in the last line, there are two problems with your code.
First, the model you defined will return a matrix of probability distributions (due to the softmax activation) for classes at each timestep.
But in custom_objective you are treating the output as vectors. You are already correctly transforming y_train to a matrix above.
So you would first have to get the actual predictions, the most simplest case is assigning the class with highest probability, i.e.:
y_pred = y_pred.argmax(axis=2)
y_true = y_true.argmax(axis=2) # this reconstructs y_train resp. a subset thereof
The second problem is that you are treating these like real variables (numpy arrays).
However, y_true and y_pred are symbolic tensors. The error you get clearly states one of the resulting problems:
TypeError: object of type 'TensorVariable' has no len()
TensorVariables have no length, as it is simply not known before real values are inserted! This then also makes iteration the way you implemented it impossible.
By the way, in cases where you iterate real vectors you might want to do it backward iteration like this: range(len(y_true)-1, -1, -1) to not go out of bounds, or even for val in y_true[::-1]
To achieve what you want, you need to treat the corresponding variables as what they are and use methods supplied for tensors.
The center of this calculation is the argmin function to find the minimum. By default this returns the first occurrence of this minimum.
Since you want to find the last occurrence of this minimum, we need to apply it to the reversed tensor and calculate it back to an index into the origianl vector.
starting_point = y_true.shape[0] - y_true[::-1].argmin() - 1
Possibly, there might be an even simpler solution to your problem as it looks like you are trying to implement something like masking.
You might want to take a look at the mask_zero=True flag for Embedding layers. This would work on the input side, though.
Related
I wish to exam the values of a tensor after mask is applied to it.
Here is a truncated part of the model. I let temp = x so later I wish to print temp to check the exact values.
So given a 4-class classification model using acoustic features. Assume I have data in (1000,50,136) as (batch, timesteps, features)
The objective is to check if the model is studying the features by timesteps. In other words, we wish to reassure the model is learning using slice as the red rectangle in the picture. Logically, it is the way for Keras LSTM layer but the confusion matrix produced is quite different when a parameter changes (eg. Dense units). The validation accuracy stays 45% thus we would like to visualize the model.
The proposed idea is to print out the first step of the first batch and print out the input in the model. If they are the same, then model is learning in the right way ((136,1) features once) instead of (50,1) timesteps of a single feature once.
input_feature = Input(shape=(X_train.shape[1],X_train.shape[2]))
x = Masking(mask_value=0)(input_feature)
temp = x
x = Dense(Dense_unit,kernel_regularizer=l2(dense_reg), activation='relu')(x)
I have tried tf.print() which brought me AttributeError: 'Tensor' object has no attribute '_datatype_enum'
As Get output from a non final keras model layer suggested by Lescurel.
model2 = Model(inputs=[input_attention, input_feature], outputs=model.get_layer('masking')).output
print(model2.predict(X_test))
AttributeError: 'Masking' object has no attribute 'op'
You want to output after mask.
lescurel's link in the comment shows how to do that.
This link to github, too.
You need to make a new model that
takes as inputs the input from your model
takes as outputs the output from the layer
I tested it with some made-up code derived from your snippets.
import numpy as np
from keras import Input
from keras.layers import Masking, Dense
from keras.regularizers import l2
from keras.models import Sequential, Model
X_train = np.random.rand(4,3,2)
Dense_unit = 1
dense_reg = 0.01
mdl = Sequential()
mdl.add(Input(shape=(X_train.shape[1],X_train.shape[2]),name='input_feature'))
mdl.add(Masking(mask_value=0,name='masking'))
mdl.add(Dense(Dense_unit,kernel_regularizer=l2(dense_reg),activation='relu',name='output_feature'))
mdl.summary()
mdl2mask = Model(inputs=mdl.input,outputs=mdl.get_layer("masking").output)
maskoutput = mdl2mask.predict(X_train)
mdloutput = mdl.predict(X_train)
maskoutput # print output after/of masking
mdloutput # print output of mdl
maskoutput.shape #(4, 3, 2): masking has the shape of the layer before (input here)
mdloutput.shape #(4, 3, 1): shape of the output of dense
I am trying to train a network on the Swiss Roll dataset with three features X = [x1, x2, x3] for the classification task. There are four classes with labels 1, 2, 3, 4, and the vector y contains the labels for all the data.
A row in the X matrix looks like this:
-5.2146470e+00 7.0879738e+00 6.7292474e+00
The shape of X is (100, 3), and the shape of y is (100,).
I want to use Radial Basis Functions to train this model. I have used the custom RBFLayer from this StackOverflow answer (also see this explanation) to build the RBFLayer. I want to use a couple of Keras Dense layers to build the network for classification.
What I have tried so far
I have used a Dense layer for the first layer, followed by the custom RBFLayer, and two other Dense layers. Here's the code:
model = Sequential()
model.add((Dense(100, input_dim=3)))
# number of units = 10, gamma = 0.05
model.add(RBFLayer(10,0.05))
model.add(Dense(15, activation='relu'))
model.add(Dense(1, activation='softmax'))
This model gives me zero accuracy. I think there is something wrong with the model architecture, but I can't figure out what is the issue.
Also, I thought the number of units in the last Dense layer should match the number of classes, which is 4 in this case. But when I set the number of units to 4 in the last layer, I get the following error:
ValueError: Shapes (None, 1) and (None, 4) are incompatible
Can you help me with this model architecture?
I faced the same issue while practicing with multi-class classification. Where I had 7 features and the model classifies into 7 classes. I tried encoding the labels and it fixed the issue.
First import LabelEncoder class from sklearn and import to_categorical from tensorflow
from sklearn.preprocessing import LabelEncoder
from tensorflow.keras.utils import to_categorical
Then, initialize an object to the LabelEncoder class and transform your labels before fitting and training the model.
encoder = LabelEncoder()
encoder.fit(y)
y = encoder.transform(y)
y = to_categorical(y)
Note that you have to use np.argmax for getting the actual predicted classification. in my case, the prediction is stored in variable called res
res = np.argmax(res, axis=None, out=None)
You can get your actual predicted class after this line. Looking forward to help you. Hope it solved your problem.
There are four classes with labels 1, 2, 3, 4, and the vector y contains the labels for all the data.
The simplest solution for input output matching is that you print the shape of the inputs and output for a single batch and then compare.
RBF layer should have no problem because output is taken from last dense layer rather then RBF layer.
With classification problem you must have last nodes equal to classes in regression the last node is 1 sometimes.
you should print
pseudo code
print(input.shape)
compare it with
print(model.input_shape)
then at output
print(output.shape)
then compare it with
print(model.predict(input).shape)
you can find the correct syntax at keras docs these are approx correct syntax / pseudo
As the question says, I can only predict from my model with model.predict_on_batch(). Keras tries to concatenate everything together if I use model.predict() and that doesn't work.
For my application (a sequence to sequence model) it is faster to do grouping on the fly. But even if I had done it in Pandas and then only used Dataset for the padded batch, .predict() still shouldn't work?
If I can get predict_on_batch to work then that's what works. But I can only predict on the first batch of the Dataset. How do I get predictions for the rest? I can't loop over the Dataset, I can't consume it...
Here's a smaller code example. The group is the same as the labels but in the real world they are obviously two different things. There are 3 classes, maximum of 2 values in a sequence, 2 row of data per batch. There's a lot of comments and I nicked parts of the windowing from somewhere on StackOverflow. I hope it is fairly legible to most.
If you have any other suggestions on how to improve the code, please comment. But no, that's not what my model looks like at all. So suggestions for that part probably aren't helpful.
EDIT: Tensorflow version 2.1.0
import tensorflow as tf
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Bidirectional, Masking, Input, Dense, GRU
import random
import numpy as np
random.seed(100)
# input data
feature = list(range(3, 14))
# shuffle data
random.shuffle(feature)
# make label from feature data, +1 because we are padding with zero
label = [feat // 5 +1 for feat in feature]
group = label[:]
# random.shuffle(group)
max_group = 2
batch_size = 2
print('Data:')
print(*zip(group, feature, label), sep='\n')
# make dataset from data arrays
ds = tf.data.Dataset.zip((tf.data.Dataset.from_tensor_slices({'group': group, 'feature': feature}),
tf.data.Dataset.from_tensor_slices({'label': label})))
# group by window
ds = ds.apply(tf.data.experimental.group_by_window(
# use feature as key (you may have to use tf.reshape(x['group'], []) instead of tf.cast)
key_func=lambda x, y: tf.cast(x['group'], tf.int64),
# convert each window to a batch
reduce_func=lambda _, window: window.batch(max_group),
# use batch size as window size
window_size=max_group))
# shuffle at most 100k rows, but commented out because we don't want to predict on shuffled data
# ds = ds.shuffle(int(1e5))
ds = ds.padded_batch(batch_size,
padded_shapes=({s: (None,) for s in ['group', 'feature']},
{s: (None,) for s in ['label']}))
# show dataset contents
print('Result:')
for element in ds:
print(element)
# Keras matches the name in the input to the tensor names in the first part of ds
inp = Input(shape=(None,), name='feature')
# RNNs require an additional rank, even if it is a degenerate dimension
duck = tf.expand_dims(inp, axis=-1)
rnn = GRU(32, return_sequences=True)(duck)
# again Keras matches names
out = Dense(max(label)+1, activation='softmax', name='label')(rnn)
model = Model(inputs=inp, outputs=out)
model.summary()
model.compile(loss='sparse_categorical_crossentropy', metrics=['accuracy'])
model.fit(ds, epochs=3)
model.predict_on_batch(ds)
You can iterate over the dataset, like so, remembering what is "x" and what is "y" in typical notation:
for item in ds:
xi, yi = item
pi = model.predict_on_batch(xi)
print(xi["group"].shape, pi.shape)
Of course, this predicts on each element individually. Otherwise you'd have to define the batches yourself, by batching matching shapes together, as the batch size itself is allowed to be variable.
I am new to machine learning and I tried the mnist dataset and I got an accuracy of around 97% but then I tried working on my image dataset and I got an accuracy of 0%. Please help me out.
This is the 97% accuracy model code:
from keras.models import Sequential
from keras.layers import Dense, Dropout, Conv2D, Flatten
from keras.callbacks import ModelCheckpoint
x_train = tf.keras.utils.normalize(x_train, axis =1)
x_test = tf.keras.utils.normalize(x_test, axis = 1)
model = Sequential()
model.add(Flatten())
model.add(Dense(128, activation = 'relu'))
model.add(Dense(128, activation = 'relu'))
model.add(Dense(10, activation = 'softmax'))
model.compile(optimizer = 'adam', loss = 'sparse_categorical_crossentropy', metrics = ['accuracy'])
checkpointer = ModelCheckpoint(filepath = 'mnist.model.weights.best.hdf5',verbose = 1,save_best_only = True, monitor = 'loss')
model.fit(x_train, y_train, epochs = 3, callbacks = [checkpointer],
batch_size = 32,verbose = 2,shuffle = True)
Now I tried with my 10 images and none of them were predicted correctly.
Below is the code:
from skimage import io
from skimage import color
import numpy as np
import tensorflow as tf
import keras
img_0 = color.rgb2gray(io.imread("0.jpg"))
img_2 = color.rgb2gray(io.imread("2.jpg"))
img_3 = color.rgb2gray(io.imread("3.jpg"))
img_4 = color.rgb2gray(io.imread("4.jpg"))
img_5 = color.rgb2gray(io.imread("5.jpg"))
img_6 = color.rgb2gray(io.imread("6.jpg"))
img_7 = color.rgb2gray(io.imread("7.jpg"))
img_8 = color.rgb2gray(io.imread("8.jpg"))
img_9 = color.rgb2gray(io.imread("9.jpg"))
array = [img_0, img_2, img_3, img_4, img_5, img_6, img_7, img_8, img_9]
#normalized the data between 0-1
array = tf.keras.utils.normalize(array, axis = 1)
#used the loop to increase the dimensions of the input layer as 1,28,28 which will be converted into 1*784
for i in array:
i = np.expand_dims(i,axis = 0)
print(i.shape)
new_model = tf.keras.models.load_model('mnist_save.model')
new_model.load_weights('mnist.model.weights.best.hdf5')
predictions = new_model.predict(array)
Can you please help me out with my problem.
If I were you, I will check the following three things.
1. Visualize both training and testing data side by side
This is the simplest way to see whether the low performance is reasonable. Basically, if the testing data is very different looking than the training data, there is no way for your pretrained model to achieve high performance in this new testing domain. Even this is not the case, visualization should be helpful to decide what simple domain adaptation can be applied to achieve better performance.
2. Double check with your L2normalization
I took a look at the source code of keras.utils.normalize
#tf_export('keras.utils.normalize')
def normalize(x, axis=-1, order=2):
"""Normalizes a Numpy array.
Arguments:
x: Numpy array to normalize.
axis: axis along which to normalize.
order: Normalization order (e.g. 2 for L2 norm).
Returns:
A normalized copy of the array.
"""
l2 = np.atleast_1d(np.linalg.norm(x, order, axis))
l2[l2 == 0] = 1
return x / np.expand_dims(l2, axis)
Since you are using the tensorflow backend, normalize along the 1st axis means what? Normalize each row? This is strange. The right way to do normalization is to (1) vectorize your input image, i.e. each image becomes a vector; and (2) normalize the resulting vector (at axis=1).
Actually, it is somewhat inappropriate especially when you want to apply a pretrained model in a different domain. This is because L2normalization is more sensitive to nonzero values. In MNIST samples, almost binarized, i.e. either 0s or 1s. However, in a grayscale image, you may meet values in [0,255], which is a completely different distribution.
You may try the simple (0,1) normalization, i.e.
x_normalized = (x-min(x))/(max(x)-min(x))
but this requires you to retrain a new model.
3. Apply domain adaptation techniques
This means you want to do the following things before feeding a testing image to your model (even before normalization).
binarize your testing image, i.e. convert to 0/1 images
negate your testing image, i.e. make 0s to 1s and 1s to 0s
centralize your testing image, i.e. shift your image such that its mass center is the image center.
Of course, what techniques to apply is dependent on the domain differences that you observe in visualization results.
I'm using keras with tensorflow backend. My goal is to query the batchsize of the current batch in a custom loss function. This is needed to compute values of the custom loss functions which depend on the index of particular observations. I like to make this clearer given the minimum reproducible examples below.
(BTW: Of course I could use the batch size defined for the training procedure and plugin it's value when defining the custom loss function, but there are some reasons why this can vary, especially if epochsize % batchsize (epochsize modulo batchsize) is unequal zero, then the last batch of an epoch has different size. I didn't found a suitable approach in stackoverflow, especially e. g.
Tensor indexing in custom loss function and Tensorflow custom loss function in Keras - loop over tensor and Looping over a tensor because obviously the shape of any tensor can't be inferred when building the graph which is the case for a loss function - shape inference is only possible when evaluating given the data, which is only possible given the graph. Hence I need to tell the custom loss function to do something with particular elements along a certain dimension without knowing the length of the dimension.
(this is the same in all examples)
from keras.models import Sequential
from keras.layers import Dense, Activation
# Generate dummy data
import numpy as np
data = np.random.random((1000, 100))
labels = np.random.randint(2, size=(1000, 1))
model = Sequential()
model.add(Dense(32, activation='relu', input_dim=100))
model.add(Dense(1, activation='sigmoid'))
example 1: nothing special without issue, no custom loss
model.compile(optimizer='rmsprop',
loss='binary_crossentropy',
metrics=['accuracy'])
# Train the model, iterating on the data in batches of 32 samples
model.fit(data, labels, epochs=10, batch_size=32)
(Output omitted, this runs perfectily fine)
example 2: nothing special, with a fairly simple custom loss
def custom_loss(yTrue, yPred):
loss = np.abs(yTrue-yPred)
return loss
model.compile(optimizer='rmsprop',
loss=custom_loss,
metrics=['accuracy'])
# Train the model, iterating on the data in batches of 32 samples
model.fit(data, labels, epochs=10, batch_size=32)
(Output omitted, this runs perfectily fine)
example 3: the issue
def custom_loss(yTrue, yPred):
print(yPred) # Output: Tensor("dense_2/Sigmoid:0", shape=(?, 1), dtype=float32)
n = yPred.shape[0]
for i in range(n): # TypeError: __index__ returned non-int (type NoneType)
loss = np.abs(yTrue[i]-yPred[int(i/2)])
return loss
model.compile(optimizer='rmsprop',
loss=custom_loss,
metrics=['accuracy'])
# Train the model, iterating on the data in batches of 32 samples
model.fit(data, labels, epochs=10, batch_size=32)
Of course the tensor has not shape info yet which can't be inferred when building the graph, only at training time. Hence for i in range(n) rises an error. Is there any way to perform this?
The traceback of the output:
-------
BTW here's my true custom loss function in case of any questions. I skipped it above for clarity and simplicity.
def neg_log_likelihood(yTrue,yPred):
yStatus = yTrue[:,0]
yTime = yTrue[:,1]
n = yTrue.shape[0]
for i in range(n):
s1 = K.greater_equal(yTime, yTime[i])
s2 = K.exp(yPred[s1])
s3 = K.sum(s2)
logsum = K.log(y3)
loss = K.sum(yStatus[i] * yPred[i] - logsum)
return loss
Here's an image of the partial negative log-likelihood of the cox proportional harzards model.
This is to clarify a question in the comments to avoid confusion. I don't think it is necessary to understand this in detail to answer the question.
As usual, don't loop. There are severe performance drawbacks and also bugs. Use only backend functions unless totally unavoidable (usually it's not unavoidable)
Solution for example 3:
So, there is a very weird thing there...
Do you really want to simply ignore half of your model's predictions? (Example 3)
Assuming this is true, just duplicate your tensor in the last dimension, flatten and discard half of it. You have the exact effect you want.
def custom_loss(true, pred):
n = K.shape(pred)[0:1]
pred = K.concatenate([pred]*2, axis=-1) #duplicate in the last axis
pred = K.flatten(pred) #flatten
pred = K.slice(pred, #take only half (= n samples)
K.constant([0], dtype="int32"),
n)
return K.abs(true - pred)
Solution for your loss function:
If you have sorted times from greater to lower, just do a cumulative sum.
Warning: If you have one time per sample, you cannot train with mini-batches!!!
batch_size = len(labels)
It makes sense to have time in an additional dimension (many times per sample), as is done in recurrent and 1D conv netoworks. Anyway, considering your example as expressed, that is shape (samples_equal_times,) for yTime:
def neg_log_likelihood(yTrue,yPred):
yStatus = yTrue[:,0]
yTime = yTrue[:,1]
n = K.shape(yTrue)[0]
#sort the times and everything else from greater to lower:
#obs, you can have the data sorted already and avoid doing it here for performance
#important, yTime will be sorted in the last dimension, make sure its (None,) in this case
# or that it's (None, time_length) in the case of many times per sample
sortedTime, sortedIndices = tf.math.top_k(yTime, n, True)
sortedStatus = K.gather(yStatus, sortedIndices)
sortedPreds = K.gather(yPred, sortedIndices)
#do the calculations
exp = K.exp(sortedPreds)
sums = K.cumsum(exp) #this will have the sum for j >= i in the loop
logsums = K.log(sums)
return K.sum(sortedStatus * sortedPreds - logsums)