I want to have a model that only predicts a certain syntactic category, for example verbs, can I update the weights of the LSTM so that they are set to 1 if the word is a verb and 0 if it is any other category?
This is my current code:
model = Sequential()
model.add(Embedding(vocab_size, embedding_size, input_length=5, weights=[pretrained_weights]))
model.add(Bidirectional(LSTM(units=embedding_size)))
model.add(Dense(2000, activation='softmax'))
for e in zip(model.layers[-1].trainable_weights, model.layers[-1].get_weights()):
print('Param %s:\n%s' % (e[0], e[1]))
weights = [layer.get_weights() for layer in model.layers]
print(weights)
print(model.summary())
# compile network
model.compile(loss='categorical_crossentropy',
optimizer = RMSprop(lr=0.001),
metrics=['accuracy'])
# fit network
history = model.fit(X_train_fit, y_train_fit, epochs=100, verbose=2, validation_data=(X_val, y_val))
score = model.evaluate(x=X_test, y=y_test, batch_size=32)
These are the weights that I am returning:
Param <tf.Variable 'dense_1/kernel:0' shape=(600, 2000) dtype=float32_ref>:
[[-0.00803087 0.0332068 -0.02052244 ... 0.03497869 0.04023124
-0.02789269]
[-0.02439511 0.02649114 0.00163587 ... -0.01433908 0.00598045
0.00556619]
[-0.01622458 -0.02026448 0.02620039 ... 0.03154427 0.00676246
0.00236203]
...
[-0.00233192 0.02012364 -0.01562861 ... -0.01857186 -0.02323328
0.01365903]
[-0.02556716 0.02962652 0.02400535 ... -0.01870854 -0.04620285
-0.02111554]
[ 0.01415684 -0.00216265 0.03434955 ... 0.01771339 0.02930249
0.002172 ]]
Param <tf.Variable 'dense_1/bias:0' shape=(2000,) dtype=float32_ref>:
[0. 0. 0. ... 0. 0. 0.]
[[array([[-0.023167 , -0.0042483, -0.10572 , ..., 0.089398 , -0.0159 ,
0.14866 ],
[-0.11112 , -0.0013859, -0.1778 , ..., 0.063374 , -0.12161 ,
0.039339 ],
[-0.065334 , -0.093031 , -0.017571 , ..., 0.16642 , -0.13079 ,
0.035397 ],
and so on.
Can I do it by updating the weights? Or is there a more efficient way to be able to only output verbs?
Thank you for the help!
In this model, with this loss (categorical_crossentropy), you cannot learn verb/non-verb labels without supervision. So, you need labeled data. Perhaps, you can use tagged corpus, e.g. Penn Tree Bank corpus, train this model which takes the input words and predicts the output labels (closed class of labels).
If you want to have one tag and regression on each word, you can change the model so the last layer becomes a value between 0 and 1:
model.add(Dense(1, activation='sigmoid'))
Then change the loss function to be a binary:
# compile network
model.compile(loss='binary_crossentropy',
optimizer = RMSprop(lr=0.001),
metrics=['accuracy'])
Then instead of labels, you should have 1 and 0 values in y_train_fit representing verb/non-verb of each word.
Related
I cannot evaluate my model because i find this error when i try to print the accuracy of my model.
How can I evalueate my model? I use LSTM to generate new data from my dataset, i know different metrics like accuracy, precision and recall but every time i try to implement to my data generated i found this problem
#scaled is my dataset that i scaled and contain 6879 line with this value:
#array([[0. , 0. , 0. , 0. , 0. ],
# [0. , 0.25 , 0. , 0.07142857, 0. ],
# [0. , 0.875 , 0. , 0.07142857, 0. ],
# ...,
# [0.98828125, 0.375 , 0.92050207, 0.5 , 0. ],
from numpy import array
n_steps = 10
n_features = 5
def split_sequences(sequences, n_steps):
X, y = list(), list()
for i in range(len(sequences)):
# find the end of this pattern
end_ix = i + n_steps
# check if we are beyond the dataset
if end_ix > len(sequences)-1:
break
# gather input and output parts of the pattern
seq_x, seq_y = sequences[i:end_ix, :], sequences[end_ix, :]
X.append(seq_x)
y.append(seq_y)
return array(X), array(y)
X, y = split_sequences(sequences=scaled, n_steps=n_steps)
print(X.shape, y.shape)
xtrain, xtest, ytrain, ytest = train_test_split(X, y, test_size=0.25, random_state=42)
# define model
LSTM_model = Sequential()
LSTM_model.add(LSTM(100, return_sequences=False,activation='relu' ,input_shape=(n_steps, n_features)))
#model.add(LSTM(100, activation='relu'))
LSTM_model.add(Dense(n_features))
LSTM_model.compile(optimizer='adam', loss='mse')
# fit model
LSTM_model.fit(xtrain, ytrain, epochs=10, batch_size=100, verbose=1)
LSTM_model.summary()
print(accuracy_score(ytest, LSTM_model.predict(xtest)[:,0,:]))
this is the error
ValueError Traceback (most recent call last)
<ipython-input-203-0e337cd696dc> in <module>()
1 #yhat = Conv1D_model.predict(X, verbose=0)
----> 2 print(accuracy_score(ytest, Conv1D_model2.predict(xtest)[:,0,:]))
3
1 frames
/usr/local/lib/python3.7/dist-packages/sklearn/metrics/_classification.py in _check_targets(y_true, y_pred)
102 # No metrics support "multiclass-multioutput" format
103 if y_type not in ["binary", "multiclass", "multilabel-indicator"]:
--> 104 raise ValueError("{0} is not supported".format(y_type))
105
106 if y_type in ["binary", "multiclass"]:
ValueError: continuous-multioutput is not supported
Edit1 From the comments, this does not look like a classification problem, so accuracy_score() won't work since it requires unique, comparable labels and predictions.
Without seeing how your predictions look (compared to your true labels), the biggest problem I see is that you're passing predictions to accuracy_score() and the comparisons will almost never match. ValueError: continuous-multioutput is not supported most likely refers to the fact that predictions are confidence float values, meanwhile true labels are integers or int-like, such as 0. or 1.. Your predictions most likely include continuous values like 0.9876 which will never match 0. or 1.. You need to discretize them with either some function or by rounding...probably.
if y_type not in ["binary", "multiclass", "multilabel-indicator"]: should also be an indicator that it's looking for either [0, 1] (binary), or [0, 1, 2, ..., n-1] (multiclass), or [[0, 1], [0, 2], [1, 2], ..., [n, m]] (multilabel).
"the set of labels predicted for a sample must exactly match the corresponding set of labels in y_true"
Do something like this:
preds = LSTM_model.predict(xtest) # gets predictions as floats
preds = np.rint(preds) # rounds to nearest int, where >=.5 becomes 1
print(accuracy_score(ytest, preds)
Edit0: You are also missing an activation function in your final Dense layer. Otherwise you're getting a roughly linear representation of the previous layer's output through n_features neurons as values.
LSTM_model.add(Dense(n_features, activation="softmax"))
LSTM_model.compile(optimizer="adam", loss="mse")
Can someone please explain dimensionality logic for input X and class Y
for sparse_categorical_crossentropy loss function ?
I checked both Keras and tf2 doc and examples, and this post.
Cross Entropy vs Sparce but one point is not clear to me.
Does the Y vector need to be expanded to the same number column as
the number classes models outputs (if I use softmax output), or
Does Keras automatically expand Y?
In my case, I have input images 32x32, and Y is a number between 0 and 10.
So the input is (batch_size, h, w), Y (batch_size, 0....10 integer value)
X = (73257, 32, 32)
Y = (73257, 1)
model.fit(X, Y, epochs=30, validation_split=0.10, batch_size=1, verbose=True)
The model itself just a Sequential bunch of Dense layers and output Softmax.
model = Sequential()
model.add(Dense(32, activation='relu',
input_shape=input_shape,
kernel_initializer='he_uniform',
bias_initializer='ones'))
# bunch of Dense layer and output softmax
model.add(Dense(10, activation='softmax'))
The error is dimensionality.
ValueError: Shape mismatch: The shape of labels (received (1, 1)) should equal the shape of logits except for the last dimension (received (1, 32, 10)).
Thank you.
As mentioned in that post, both categorical cross-entropy (cce) and sparse categorical cross-entropy (scc) have the same loss function just except the format of the true label Y. Simply if Y is an integer, you would use scc whereas if Y is one-hot, you would use cce. So for scc, ground truth Y is mostly 1D whereas in cce, ground truth Y mostly is 2D. For ground truth
- (num_of_samples, n_class_one_hot_encode) <- for cce (2D)
- (num_of_samples, n_class_int) <- for scc (1D)
For example, if we use the cifar10 data set, we can do
import tensorflow as tf
(x_train, y_train), (_, _) = tf.keras.datasets.cifar10.load_data()
# train set / data
x_train = x_train.astype('float32') / 255
sparse = y_train
onehot = y_train
onehot = tf.keras.utils.to_categorical(onehot , num_classes=10)
print(sparse[:5]) # < --- (num_of_samples, n_class_int)
print(onehot[:5]) # < --- (num_of_samples, n_class_one_hot_encode)
[[6]
[9]
[9]
[4]
[1]]
[[0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 1.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 1.]
[0. 0. 0. 0. 1. 0. 0. 0. 0. 0.]
[0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]]
Now, let's define a simple model and train using the above both and see what happens.
def net():
input = tf.keras.Input(shape=(32, 32, 3))
x = tf.keras.layers.Conv2D(16, 3, activation="relu")(input)
x = tf.keras.layers.MaxPooling2D(3)(x)
x = tf.keras.layers.GlobalMaxPooling2D()(x)
x = tf.keras.layers.Dense(10, activation='softmax')(x)
model = tf.keras.Model(input, x)
return model
Using cce
model = net()
model.compile(
loss = tf.keras.losses.CategoricalCrossentropy(),
metrics = 'accuracy',
optimizer = 'adam')
his = model.train_on_batch(x_train, onehot, return_dict=True)
print(his)
{'loss': 2.376708984375, 'accuracy': 0.09651999920606613}
one_hot_pred = model.predict(x_train)
print(onehot[0])
print(one_hot_pred[0])
print(onehot[0].shape)
print(one_hot_pred[0].shape)
[0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
[0.1516315 0.1151238 0.11732318 0.10644271 0.08946694 0.1398355
0.05046898 0.04249624 0.11813554 0.06907552]
(10,)
(10,)
Now, using scc
model = net()
model.compile(
loss = tf.keras.losses.SparseCategoricalCrossentropy(),
metrics = 'accuracy',
optimizer = 'adam')
his = model.train_on_batch(x_train, sparse, return_dict=True)
print(his)
{'loss': 2.331458806991577, 'accuracy': 0.10066000372171402}
sparse_pred = model.predict(x_train)
print(sparse[0])
print(sparse_pred[0])
print(sparse[0].shape)
print(sparse_pred[0].shape)
[6]
[0.07184976 0.08837385 0.06910037 0.12347631 0.09542189 0.09981853
0.11247937 0.06707954 0.14902702 0.12337337]
(1,)
(10,)
Observe that, gt and pred shape for scc are (1,) and (10,). In this case, the loss computes the logarithm only for output index which ground truth indicates to. For example, the gt here is 6, and from pred the loss will compute only the logarithm of pred[6]. Here are some little more details of it.
I've been writing some custom layers and I have realized my bias values will train but my weights are not training. I'm going to use a very simplified code here to illustrate the issue.
class myWeights(Layer):
def __init__(self, units, **kwargs):
self.units = units
super(myWeights, self).__init__(**kwargs)
def build(self, input_shape):
self.w = self.add_weight(shape=(input_shape[-1], self.units),
initializer='GlorotUniform',
trainable=True)
self.b = self.add_weight(shape=(self.units,),
initializer='random_normal',
trainable=True)
super(myWeights, self).build(input_shape)
def call(self, inputs):
return tf.matmul(inputs, self.w) + self.b
def compute_output_shape(self, input_shape):
return(input_shape[0],self.units)
Now I set up MNIST data to train. I also set a seed so this is reproducible on your end.
tf.random.set_seed(1234)
mnist = tf.keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train=tf.keras.utils.normalize(x_train, axis=1)
x_test=tf.keras.utils.normalize(x_test, axis=1)
I build out the model using the functional API
inp=Input(shape=(x_train.shape[1:]))
flat=Flatten()(inp)
hid=myWeights(32)(flat)
out=Dense(10, 'softmax')(hid)
model=Model(inp,out)
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
Now when I check the values of the parameters using
print(model.layers[2].get_weights())
I see output like the following, which I have reformatted for easier reading.
[array([[ 0.00652369, -0.02321771, 0.01399945, ..., -0.07599965,
-0.04356881, -0.0333882 ],
[-0.03132245, -0.05264733, 0.05576386, ..., -0.03755575,
0.07358163, -0.02338506],
[-0.01808248, 0.04092623, 0.02177643, ..., 0.00971264,
0.07631209, 0.0495184 ],
...,
[-0.03780914, 0.00219346, 0.04460619, ..., -0.06703794,
0.03407502, -0.01071112],
[-0.0012739 , -0.0683699 , -0.06152753, ..., 0.05373723,
0.03079057, 0.00855774],
[ 0.06245673, -0.07649396, 0.06748571, ..., -0.06948434,
-0.01416317, -0.08318184]], dtype=float32), *
array([ 0.05734033, 0.04822996, 0.04391507, -0.01550511, 0.05383257,
0.05043739, -0.04092903, -0.0081823 , -0.06425817, 0.02402171,
-0.00374672, -0.06069579, -0.08422226, 0.02909392, -0.02071654,
0.0422841 , -0.05020861, 0.01267704, 0.0365625 , -0.01743891,
-0.01030697, 0.00639807, -0.01493454, 0.03214667, 0.03262959,
0.07799669, 0.05789128, 0.01754347, -0.07558075, 0.0466203 ,
-0.05332188, 0.00270758], dtype=float32)]*
After training with
model.fit(x_train,y_train, epochs=3, verbose=1)
print(model.layers[2].get_weights())
I find the following output.
[array([[ 0.00652369, -0.02321771, 0.01399945, ..., -0.07599965,
-0.04356881, -0.0333882 ],
[-0.03132245, -0.05264733, 0.05576386, ..., -0.03755575,
0.07358163, -0.02338506],
[-0.01808248, 0.04092623, 0.02177643, ..., 0.00971264,
0.07631209, 0.0495184 ],
...,
[-0.03780914, 0.00219346, 0.04460619, ..., -0.06703794,
0.03407502, -0.01071112],
[-0.0012739 , -0.0683699 , -0.06152753, ..., 0.05373723,
0.03079057, 0.00855774],
[ 0.06245673, -0.07649396, 0.06748571, ..., -0.06948434,
-0.01416317, -0.08318184]], dtype=float32), *
array([-0.250459 , -0.21746232, 0.01250297, 0.00065066, -0.09093136,
0.04943814, -0.13446714, -0.11985168, 0.23259214, -0.14288908,
0.03274751, 0.1462888 , -0.2206902 , 0.14455307, 0.17767513,
0.11378342, -0.22250313, 0.11601174, -0.1855521 , 0.0900097 ,
0.21218981, -0.03386492, -0.06818825, 0.34211585, -0.24891953,
0.08827516, 0.2806849 , 0.07634751, -0.32905066, -0.1860122 ,
0.06170518, -0.20212872], dtype=float32)]*
I can see that the bias values have changed but the weight values are static. I'm not sure at all why this is occurring.
What your trying is Multilayer Perceptron (MLP), MLP is usually composed of one(passthrough) input layer, one or more layers
of TLUs, called hidden layers, and one final layer of TLUs called the
output layer.
Here the signal flows only in one direction (from the inputs to the outputs), so this
architecture is an example of a feedforward neural network (FNN).
See this link which will explain feedforward neural network.
Coming to the explanation of your code, you are initializing weights using some initializers. So the first initialization of weights happens at the hidden layer and then gets updated in the next Dense layer.
So whatever the weights are initialized will remain the same even after training in the hidden layer since it is a feedforward neural network means it is not dependent on the output of the current layer.
But if you want to check your code then you can include one more hidden layer exactly as the one which is present and see the weights for layer 3(hidden layer 2) which looks something like this.
inp=Input(shape=(x_train.shape[1:]))
flat=Flatten()(inp)
hid=myWeights(32)(flat)
hid2=myWeights(32)(hid)
out=Dense(10, 'softmax')(hid2)
model=Model(inp,out)
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
Then by printing the weights before fit and after fit for hidden2 layer will give you different weights, since the weights for the hidden 2 layer is dependent on the output of the hidden 1 layer.
print(model.layers[3].get_weights())
I have the following NN:
cc = Input(shape=(3,))
dd = Dense(1,activation='tanh')(cc)
dense_model3 = Model(inputs=cc, outputs=dd)
# Compile
dense_model3.compile(optimizer='adam', loss='mean_squared_error')
dense_model3.fit(copstage3,y_stage9, batch_size=150, epochs=100)
ypredi3 = dense_model3.predict(copstage3,batch_size=150, steps = None)
and when I use dense_model3.get_weights() ,I get :
([array([[0.15411839],
[1.072346 ],
[0.37893268]], dtype=float32), array([-0.13432428], dtype=float32)]
However ,as I have 150 rows in my data ,I would expect 150 different weights, representing each row. What am I missing?
Your model has input of size 3,
cc = Input(shape=(3,))
And output of size 1,
dd = Dense(1,activation='tanh')(cc)
There are no intermediate layers. So weights are associated with three inputs and one output as given.
([array([[0.15411839],
[1.072346 ],
[0.37893268]], dtype=float32), array([-0.13432428], dtype=float32)]
Where
[array([[0.15411839], [1.072346 ], [0.37893268]], dtype=float32)
represents weights of input layer of size three and
array([-0.13432428], dtype=float32)
represents weights of output layer of size one.
150 rows of data is used to train this layer and after training, the weights are associated to each individual neuron or node.
Hope this helps.
I am trying to implement a simple neural network for multi-class classification in Keras. The code is:
model = Sequential()
model.add(Dense(512, input_dim = 55 , kernel_regularizer=l2(0.00001),
activation = 'relu'))
model.add(Dense(8, activation = 'softmax'))
model.compile(loss = 'categorical_crossentropy' , optimizer = 'adam' , metrics = ['accuracy'] )
model.fit(X_train, dummy_y, epochs = 20, batch_size = 30, class_weight=class_weights)
I have 55 features and I want to predict one of 8 classes (0,1,2,3,4,5,6,7). I also encode y_train like this:
encoder = LabelEncoder()
encoder.fit(y_train)
encoded_Y = encoder.transform(y_train)
# convert integers to dummy variables (i.e. one hot encoded)
dummy_y = np_utils.to_categorical(encoded_Y)
However, when I use predict() the output is the an array for the probability of each class:
array([[3.3881092e-01, 2.6201099e-06, 1.9504215e-03, ..., 7.0641324e-02,
4.4026113e-01, 1.2641836e-02],
[2.3457911e-02, 5.5409328e-04, 2.8759112e-05, ..., 2.1585675e-03,
5.5625242e-01, 1.0208529e-01],
[4.6981460e-01, 2.0882198e-05, 1.4895502e-01, ..., 1.3179567e-01,
2.2908358e-01, 1.4160757e-03],
...
How should I modify the network in order to output the class with the highest probability? Like this:
[[0,5,7,3,2,0,0,.....]]
You can simply use the predict_classes method:
preds_classes = model.predict_classes(X_test)
Those numbers you see as the output of predict method are the probability or confidence score of each class. Therefore, as an alternative solution, you can take the index of the maximum score which corresponds to the predicted class:
import numpy as np
probs = model.predict(X_test)
classes = np.argmax(probs, axis=-1)