Tensorflow Custom layer weights not training but bias is

Tensorflow Custom layer weights not training but bias is - python

I've been writing some custom layers and I have realized my bias values will train but my weights are not training. I'm going to use a very simplified code here to illustrate the issue.
class myWeights(Layer):
def __init__(self, units, **kwargs):
self.units = units
super(myWeights, self).__init__(**kwargs)
def build(self, input_shape):
self.w = self.add_weight(shape=(input_shape[-1], self.units),
initializer='GlorotUniform',
trainable=True)
self.b = self.add_weight(shape=(self.units,),
initializer='random_normal',
trainable=True)
super(myWeights, self).build(input_shape)
def call(self, inputs):
return tf.matmul(inputs, self.w) + self.b
def compute_output_shape(self, input_shape):
return(input_shape[0],self.units)
Now I set up MNIST data to train. I also set a seed so this is reproducible on your end.
tf.random.set_seed(1234)
mnist = tf.keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train=tf.keras.utils.normalize(x_train, axis=1)
x_test=tf.keras.utils.normalize(x_test, axis=1)
I build out the model using the functional API
inp=Input(shape=(x_train.shape[1:]))
flat=Flatten()(inp)
hid=myWeights(32)(flat)
out=Dense(10, 'softmax')(hid)
model=Model(inp,out)
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
Now when I check the values of the parameters using
print(model.layers[2].get_weights())
I see output like the following, which I have reformatted for easier reading.
[array([[ 0.00652369, -0.02321771, 0.01399945, ..., -0.07599965,
-0.04356881, -0.0333882 ],
[-0.03132245, -0.05264733, 0.05576386, ..., -0.03755575,
0.07358163, -0.02338506],
[-0.01808248, 0.04092623, 0.02177643, ..., 0.00971264,
0.07631209, 0.0495184 ],
...,
[-0.03780914, 0.00219346, 0.04460619, ..., -0.06703794,
0.03407502, -0.01071112],
[-0.0012739 , -0.0683699 , -0.06152753, ..., 0.05373723,
0.03079057, 0.00855774],
[ 0.06245673, -0.07649396, 0.06748571, ..., -0.06948434,
-0.01416317, -0.08318184]], dtype=float32), *
array([ 0.05734033, 0.04822996, 0.04391507, -0.01550511, 0.05383257,
0.05043739, -0.04092903, -0.0081823 , -0.06425817, 0.02402171,
-0.00374672, -0.06069579, -0.08422226, 0.02909392, -0.02071654,
0.0422841 , -0.05020861, 0.01267704, 0.0365625 , -0.01743891,
-0.01030697, 0.00639807, -0.01493454, 0.03214667, 0.03262959,
0.07799669, 0.05789128, 0.01754347, -0.07558075, 0.0466203 ,
-0.05332188, 0.00270758], dtype=float32)]*
After training with
model.fit(x_train,y_train, epochs=3, verbose=1)
print(model.layers[2].get_weights())
I find the following output.
[array([[ 0.00652369, -0.02321771, 0.01399945, ..., -0.07599965,
-0.04356881, -0.0333882 ],
[-0.03132245, -0.05264733, 0.05576386, ..., -0.03755575,
0.07358163, -0.02338506],
[-0.01808248, 0.04092623, 0.02177643, ..., 0.00971264,
0.07631209, 0.0495184 ],
...,
[-0.03780914, 0.00219346, 0.04460619, ..., -0.06703794,
0.03407502, -0.01071112],
[-0.0012739 , -0.0683699 , -0.06152753, ..., 0.05373723,
0.03079057, 0.00855774],
[ 0.06245673, -0.07649396, 0.06748571, ..., -0.06948434,
-0.01416317, -0.08318184]], dtype=float32), *
array([-0.250459 , -0.21746232, 0.01250297, 0.00065066, -0.09093136,
0.04943814, -0.13446714, -0.11985168, 0.23259214, -0.14288908,
0.03274751, 0.1462888 , -0.2206902 , 0.14455307, 0.17767513,
0.11378342, -0.22250313, 0.11601174, -0.1855521 , 0.0900097 ,
0.21218981, -0.03386492, -0.06818825, 0.34211585, -0.24891953,
0.08827516, 0.2806849 , 0.07634751, -0.32905066, -0.1860122 ,
0.06170518, -0.20212872], dtype=float32)]*
I can see that the bias values have changed but the weight values are static. I'm not sure at all why this is occurring.

What your trying is Multilayer Perceptron (MLP), MLP is usually composed of one(passthrough) input layer, one or more layers
of TLUs, called hidden layers, and one final layer of TLUs called the
output layer.
Here the signal flows only in one direction (from the inputs to the outputs), so this
architecture is an example of a feedforward neural network (FNN).
See this link which will explain feedforward neural network.
Coming to the explanation of your code, you are initializing weights using some initializers. So the first initialization of weights happens at the hidden layer and then gets updated in the next Dense layer.
So whatever the weights are initialized will remain the same even after training in the hidden layer since it is a feedforward neural network means it is not dependent on the output of the current layer.
But if you want to check your code then you can include one more hidden layer exactly as the one which is present and see the weights for layer 3(hidden layer 2) which looks something like this.
inp=Input(shape=(x_train.shape[1:]))
flat=Flatten()(inp)
hid=myWeights(32)(flat)
hid2=myWeights(32)(hid)
out=Dense(10, 'softmax')(hid2)
model=Model(inp,out)
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
Then by printing the weights before fit and after fit for hidden2 layer will give you different weights, since the weights for the hidden 2 layer is dependent on the output of the hidden 1 layer.
print(model.layers[3].get_weights())

Related

Exporting/Viewing the whole Tensor (764-d)

I need to export the formed tensor for which I used this code:
import tensorflow as tf
from transformers import BertTokenizer, TFBertModel
def get_embeddings(model_name,tokenizer,name,inp):
tokenizer = tokenizer.from_pretrained(name)
model = model_name.from_pretrained(name)
input_ids = tf.constant(tokenizer.encode(inp))[None, :] # Batch size 1
outputs = model(input_ids)
last_hidden_states = outputs[0]
cls_token=last_hidden_states[0]
return cls_token
cls_token=get_embeddings(TFBertModel,BertTokenizer,'bert-base-uncased',z[0])
cls_token
I received the following output:
Some layers from the model checkpoint at bert-base-uncased were not used when initializing TFBertModel: ['nsp___cls', 'mlm___cls']
- This IS expected if you are initializing TFBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
All the layers of TFBertModel were initialized from the model checkpoint at bert-base-uncased.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFBertModel for predictions without further training.
<tf.Tensor: shape=(21, 768), dtype=float32, numpy=
array([[-0.24550161, -0.34956726, 0.01089635, ..., -0.38017362,
-0.03965453, 0.41104677],
[ 0.4436314 , -0.3720695 , 0.27837285, ..., 0.7340785 ,
-0.02534109, -0.24059379],
[ 0.00089747, -0.18920937, 0.83858776, ..., 0.58318835,
-0.03517005, -0.29172006],
...,
[-0.06368805, 0.00210648, 0.52235216, ..., 0.32049924,
-0.06555019, 0.20605275],
[-0.10185663, -0.53307414, -0.37091127, ..., -0.17225765,
-0.45891476, 0.30040386],
[ 0.59691334, 0.12757768, -0.27682877, ..., -0.07072508,
-0.6099813 , -0.00861905]], dtype=float32)>
I want to export/view the full tensor either in array or CSV format, preferably the latter.

How to visualize training process with output per patch/epoch?

My neural network in Keras learns a representation of my original data. In order to see exactly how it learns I thought it would be interesting to plot the data for every training batch (or epoch alternatively) and convert the plots into a video.
I'm stuck on how to get the outputs of my model during the training phase.
I thought about doing something like this (pseudo code):
epochs = 200
plt_outputs = []
for i in range(epochs):
model.fit(x_train,y_train, epochs = 1)
plt_outputs.append(output_layer(x_test))
where output_layer is the layer in my neural network I'm interested in. Afterwards I would use plot_data to generate each plot and turn it into a video. (That part I'm not concerned about yet..)
But that doesn't strike me as a good solution, plus I don't know how get the output for every batch. Any thoughts on this?

You can customize what happens in the test step, much like this official tutorial:
import tensorflow as tf
import numpy as np
class CustomModel(tf.keras.Model):
def test_step(self, data):
# Unpack the data
x, y = data
# Compute predictions
y_pred = self(x, training=False)
test_outputs.append(y_pred) # ADD THIS HERE
# Updates the metrics tracking the loss
self.compiled_loss(y, y_pred, regularization_losses=self.losses)
# Update the metrics.
self.compiled_metrics.update_state(y, y_pred)
# Return a dict mapping metric names to current value.
# Note that it will include the loss (tracked in self.metrics).
return {m.name: m.result() for m in self.metrics}
# Construct an instance of CustomModel
inputs = tf.keras.Input(shape=(8,))
x = tf.keras.layers.Dense(8, activation='relu')(inputs)
outputs = tf.keras.layers.Dense(1)(x)
model = CustomModel(inputs, outputs)
model.compile(loss="mse", metrics=["mae"], run_eagerly=True)
test_outputs = list() # ADD THIS HERE
# Evaluate with our custom test_step
x = np.random.random((1000, 8))
y = np.random.random((1000, 1))
model.evaluate(x, y)
I added a list, and now in the test step, it will append this list with the output. You will need to add run_eagerly=True in model.compile() for this to work. This will output a list of such outputs:
<tf.Tensor: shape=(32, 1), dtype=float32, numpy=
array([[ 0.10866462],
[ 0.2749035 ],
[ 0.08196291],
[ 0.25862294],
[ 0.30985728],
[ 0.20230596],
...
[ 0.17108777],
[ 0.29692617],
[-0.03684975],
[ 0.03525433],
[ 0.26774448],
[ 0.21728781],
[ 0.0840873 ]], dtype=float32)>

Keras get_weights() does not return all weights

I have the following NN:
cc = Input(shape=(3,))
dd = Dense(1,activation='tanh')(cc)
dense_model3 = Model(inputs=cc, outputs=dd)
# Compile
dense_model3.compile(optimizer='adam', loss='mean_squared_error')
dense_model3.fit(copstage3,y_stage9, batch_size=150, epochs=100)
ypredi3 = dense_model3.predict(copstage3,batch_size=150, steps = None)
and when I use dense_model3.get_weights() ,I get :
([array([[0.15411839],
[1.072346 ],
[0.37893268]], dtype=float32), array([-0.13432428], dtype=float32)]
However ,as I have 150 rows in my data ,I would expect 150 different weights, representing each row. What am I missing?

Your model has input of size 3,
cc = Input(shape=(3,))
And output of size 1,
dd = Dense(1,activation='tanh')(cc)
There are no intermediate layers. So weights are associated with three inputs and one output as given.
([array([[0.15411839],
[1.072346 ],
[0.37893268]], dtype=float32), array([-0.13432428], dtype=float32)]
Where
[array([[0.15411839], [1.072346 ], [0.37893268]], dtype=float32)
represents weights of input layer of size three and
array([-0.13432428], dtype=float32)
represents weights of output layer of size one.
150 rows of data is used to train this layer and after training, the weights are associated to each individual neuron or node.
Hope this helps.

Find input that maximises output of a neural network using Keras and TensorFlow

I have used Keras and TensorFlow to classify the Fashion MNIST following this tutorial .
It uses the AdamOptimizer to find the value for model parameters that minimize the loss function of the network. The input for the network is a 2-D tensor with shape [28, 28], and output is a 1-D tensor with shape [10] which is the result of a softmax function.
Once the network has been trained, I want to use the optimizer for another task: find an input that maximizes one of the elements of the output tensor. How can this be done? Is it possible to do so using Keras or one have to use a lower level API?
Since the input is not unique for a given output, it would be even better if we could impose some constraints on the values the input can take.
The trained model has the following format
model = keras.Sequential([
keras.layers.Flatten(input_shape=(28, 28)),
keras.layers.Dense(128, activation=tf.nn.relu),
keras.layers.Dense(10, activation=tf.nn.softmax)
])

I feel you would want to backprop with respect to the input freezing all the weights to your model. What you could do is:
Add a dense layer after the input layer with the same dimensions as input and set it as trainable
Freeze all the other layers of your model. (except the one you added)
As an input, feed an identity matrix and train your model based on whatever output you desire.
This article and this post might be able to help you if you want to backprop based on the input instead. It's a bit like what you are aiming for but you can get the intuition.

It would be very similar to the way that filters of a Convolutional Network is visualized: we would do gradient ascent optimization in input space to maximize the response of a particular filter.
Here is how to do it: after training is finished, first we need to specify the output and define a loss function that we want to maximize:
from keras import backend as K
output_class = 0 # the index of the output class we want to maximize
output = model.layers[-1].output
loss = K.mean(output[:,output_class]) # get the average activation of our desired class over the batch
Next, we need to take the gradient of the loss we have defined above with respect to the input layer:
grads = K.gradients(loss, model.input)[0] # the output of `gradients` is a list, just take the first (and only) element
grads = K.l2_normalize(grads) # normalize the gradients to help having an smooth optimization process
Next, we need to define a backend function that takes the initial input image and gives the values of loss and gradients as outputs, so that we can use it in the next step to implement the optimization process:
func = K.function([model.input], [loss, grads])
Finally, we implement the gradient ascent optimization process:
import numpy as np
input_img = np.random.random((1, 28, 28)) # define an initial random image
lr = 1. # learning rate used for gradient updates
max_iter = 50 # number of gradient updates iterations
for i in range(max_iter):
loss_val, grads_val = func([input_img])
input_img += grads_val * lr # update the image based on gradients
Note that, after this process is finished, to display the image you may need to make sure that all the values in the image are in the range [0, 255] (or [0,1]).

After the hints Saket Kumar Singh gave in his answer, I wrote the following that seems to solve the question.
I create two custom layers. Maybe Keras offers already some classes that are equivalent to them.
The first on is a trainable input:
class MyInputLayer(keras.layers.Layer):
def __init__(self, output_dim, **kwargs):
self.output_dim = output_dim
super(MyInputLayer, self).__init__(**kwargs)
def build(self, input_shape):
self.kernel = self.add_weight(name='kernel',
shape=self.output_dim,
initializer='uniform',
trainable=True)
super(MyInputLayer, self).build(input_shape)
def call(self, x):
return self.kernel
def compute_output_shape(self, input_shape):
return self.output_dim
The second one gets the probability of the label of interest:
class MySelectionLayer(keras.layers.Layer):
def __init__(self, position, **kwargs):
self.position = position
self.output_dim = 1
super(MySelectionLayer, self).__init__(**kwargs)
def build(self, input_shape):
super(MySelectionLayer, self).build(input_shape)
def call(self, x):
mask = np.array([False]*x.shape[-1])
mask[self.position] = True
return tf.boolean_mask(x, mask,axis=1)
def compute_output_shape(self, input_shape):
return self.output_dim
I used them in this way:
# Build the model
layer_flatten = keras.layers.Flatten(input_shape=(28, 28))
layerDense1 = keras.layers.Dense(128, activation=tf.nn.relu)
layerDense2 = keras.layers.Dense(10, activation=tf.nn.softmax)
model = keras.Sequential([
layer_flatten,
layerDense1,
layerDense2
])
# Compile the model
model.compile(optimizer=tf.train.AdamOptimizer(),
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
# Train the model
# ...
# Freeze the model
layerDense1.trainable = False
layerDense2.trainable = False
# Build another model
class_index = 7
layerInput = MyInputLayer((1,784))
layerSelection = MySelectionLayer(class_index)
model_extended = keras.Sequential([
layerInput,
layerDense1,
layerDense2,
layerSelection
])
# Compile it
model_extended.compile(optimizer=tf.train.AdamOptimizer(),
loss='mean_absolute_error')
# Train it
dummyInput = np.ones((1,1))
target = np.ones((1,1))
model_extended.fit(dummyInput, target,epochs=300)
# Retrieve the weights of layerInput
layerInput.get_weights()[0]

Interesting. Maybe a solution would be to feed all your data to the network and for each sample save the output_layer after softmax.
This way, for 3 classes, where you want to find the best input for class 1, you are looking for outputs where the first component is high. For example: [1 0 0]
Indeed the output means the probability, or the confidence of the network, for the sample being one of the classes.

Funny coincident I was just working on the same "problem". I'm interested in the direction of adversarial training etc. What I did was to insert a LocallyConnected2D Layer after the input and then train with data which is all one and has as targets the class of interest.
As model I use
batch_size = 64
num_classes = 10
epochs = 20
input_shape = (28, 28, 1)
inp = tf.keras.layers.Input(shape=input_shape)
conv1 = tf.keras.layers.Conv2D(32, kernel_size=(3, 3),activation='relu',kernel_initializer='he_normal')(inp)
pool1 = tf.keras.layers.MaxPool2D((2, 2))(conv1)
drop1 = tf.keras.layers.Dropout(0.20)(pool1)
flat = tf.keras.layers.Flatten()(drop1)
fc1 = tf.keras.layers.Dense(128, activation='relu')(flat)
norm1 = tf.keras.layers.BatchNormalization()(fc1)
dropfc1 = tf.keras.layers.Dropout(0.25)(norm1)
out = tf.keras.layers.Dense(num_classes, activation='softmax')(dropfc1)
model = tf.keras.models.Model(inputs = inp , outputs = out)
model.compile(loss=tf.keras.losses.categorical_crossentropy,
optimizer=tf.keras.optimizers.RMSprop(),
metrics=['accuracy'])
model.summary()
after training I insert the new layer
def insert_intermediate_layer_in_keras(model,position, before_layer_id):
layers = [l for l in model.layers]
if(before_layer_id==0) :
x = new_layer
else:
x = layers[0].output
for i in range(1, len(layers)):
if i == before_layer_id:
x = new_layer(x)
x = layers[i](x)
else:
x = layers[i](x)
new_model = tf.keras.models.Model(inputs=layers[0].input, outputs=x)
return new_model
def fix_model(model):
for l in model.layers:
l.trainable=False
fix_model(model)
new_layer = tf.keras.layers.LocallyConnected2D(1, kernel_size=(1, 1),
activation='linear',
kernel_initializer='he_normal',
use_bias=False)
new_model = insert_intermediate_layer_in_keras(model,new_layer,1)
new_model.compile(loss=tf.keras.losses.categorical_crossentropy,
optimizer=tf.keras.optimizers.RMSprop(),
metrics=['accuracy'])
and finally rerun training with my fake data.
X_fake = np.ones((60000,28,28,1))
print(Y_test.shape)
y_fake = np.ones((60000))
Y_fake = tf.keras.utils.to_categorical(y_fake, num_classes)
new_model.fit(X_fake, Y_fake, epochs=100)
weights = new_layer.get_weights()[0]
imshow(weights.reshape(28,28))
plt.show()
Results are not yet satisfying but I'm confident of the approach and guess I need to play around with the optimiser.

How to restrict the vocabulary size of an LSTM?

I want to have a model that only predicts a certain syntactic category, for example verbs, can I update the weights of the LSTM so that they are set to 1 if the word is a verb and 0 if it is any other category?
This is my current code:
model = Sequential()
model.add(Embedding(vocab_size, embedding_size, input_length=5, weights=[pretrained_weights]))
model.add(Bidirectional(LSTM(units=embedding_size)))
model.add(Dense(2000, activation='softmax'))
for e in zip(model.layers[-1].trainable_weights, model.layers[-1].get_weights()):
print('Param %s:\n%s' % (e[0], e[1]))
weights = [layer.get_weights() for layer in model.layers]
print(weights)
print(model.summary())
# compile network
model.compile(loss='categorical_crossentropy',
optimizer = RMSprop(lr=0.001),
metrics=['accuracy'])
# fit network
history = model.fit(X_train_fit, y_train_fit, epochs=100, verbose=2, validation_data=(X_val, y_val))
score = model.evaluate(x=X_test, y=y_test, batch_size=32)
These are the weights that I am returning:
Param <tf.Variable 'dense_1/kernel:0' shape=(600, 2000) dtype=float32_ref>:
[[-0.00803087 0.0332068 -0.02052244 ... 0.03497869 0.04023124
-0.02789269]
[-0.02439511 0.02649114 0.00163587 ... -0.01433908 0.00598045
0.00556619]
[-0.01622458 -0.02026448 0.02620039 ... 0.03154427 0.00676246
0.00236203]
...
[-0.00233192 0.02012364 -0.01562861 ... -0.01857186 -0.02323328
0.01365903]
[-0.02556716 0.02962652 0.02400535 ... -0.01870854 -0.04620285
-0.02111554]
[ 0.01415684 -0.00216265 0.03434955 ... 0.01771339 0.02930249
0.002172 ]]
Param <tf.Variable 'dense_1/bias:0' shape=(2000,) dtype=float32_ref>:
[0. 0. 0. ... 0. 0. 0.]
[[array([[-0.023167 , -0.0042483, -0.10572 , ..., 0.089398 , -0.0159 ,
0.14866 ],
[-0.11112 , -0.0013859, -0.1778 , ..., 0.063374 , -0.12161 ,
0.039339 ],
[-0.065334 , -0.093031 , -0.017571 , ..., 0.16642 , -0.13079 ,
0.035397 ],
and so on.
Can I do it by updating the weights? Or is there a more efficient way to be able to only output verbs?
Thank you for the help!

In this model, with this loss (categorical_crossentropy), you cannot learn verb/non-verb labels without supervision. So, you need labeled data. Perhaps, you can use tagged corpus, e.g. Penn Tree Bank corpus, train this model which takes the input words and predicts the output labels (closed class of labels).
If you want to have one tag and regression on each word, you can change the model so the last layer becomes a value between 0 and 1:
model.add(Dense(1, activation='sigmoid'))
Then change the loss function to be a binary:
# compile network
model.compile(loss='binary_crossentropy',
optimizer = RMSprop(lr=0.001),
metrics=['accuracy'])
Then instead of labels, you should have 1 and 0 values in y_train_fit representing verb/non-verb of each word.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Tensorflow Custom layer weights not training but bias is - python

Related

Exporting/Viewing the whole Tensor (764-d)

How to visualize training process with output per patch/epoch?

Keras get_weights() does not return all weights

Find input that maximises output of a neural network using Keras and TensorFlow

How to restrict the vocabulary size of an LSTM?

Categories

Resources