Could I set a part of a tensor untrainable? - python

It's easy to set a tensor untrainable, trainable=False. But Could I set only part of a tensor untrainable?
Suppose I have a 2*2 tensor, I only want one element untrainable and the other three elements trainable.
Like this (I want the 1,1 element always to be zero, and the other three elements updated by optimizer)
untrainable trainable
trainable trainable
Thanks.

Short answer: you can't.
Longer answer: you can mimic that effect by setting part of the gradient to zero after the computation of the gradient so that part of the variable is never updated.
Here is an example:
import tensorflow as tf
tf.random.set_seed(0)
model = tf.keras.Sequential([tf.keras.layers.Dense(2, activation="sigmoid", input_shape=(2,), name="first"), tf.keras.layers.Dense(1,activation="sigmoid")])
X = tf.random.normal((1000,2))
y = tf.reduce_sum(X, axis=1)
ds = tf.data.Dataset.from_tensor_slices((X,y))
In that example, the first layer has a weight W of the following:
>>> model.get_layer("first").trainable_weights[0]
<tf.Variable 'first/kernel:0' shape=(2, 2) dtype=float32, numpy=
array([[ 0.13573623, -0.68269 ],
[ 0.8938798 , 0.6792033 ]], dtype=float32)>
We then write the custom loop that will only update the first row of that weight W :
loss = tf.losses.MSE
opt = tf.optimizers.SDG(1.) # high learning rate to see the change
for xx,yy in ds.take(1):
with tf.GradientTape() as tape:
l = loss(model(xx),yy)
g = tape.gradient(l,model.get_layer("first").trainable_weights[0])
gradient_slice = g[:1] # first row
new_grad = tf.concat([gradient_slice, tf.zeros((1,2), dtype=tf.float32),], axis=0) # replacing the rest with zeros
opt.apply_gradients(zip([new_grad], [model.get_layer("first").trainable_weights[0]]))
And then, after running that loop, we can inspect the wieghts again:
model.get_layer("first").trainable_weights[0]
<tf.Variable 'first/kernel:0' shape=(2, 2) dtype=float32, numpy=
array([[-0.08515069, -0.51738167],
[ 0.8938798 , 0.6792033 ]], dtype=float32)>
And only the first row changed.

Related

How to convert ragged tensor to list of tensors inside a graph

I have a ragged tensor with variable shape in the 2nd dimension N x ? x 4, I'd like to convert it to a list of tensors.
Down below is a function that works, but only when it's not decorated with tf.function. I need this function to run inside a tf graph.
import tensorflow as tf
raggedTensor = tf.ragged.constant([[[0.7688891291618347, 0.3979208469390869, 0.9807137250900269, 0.5825483798980713],
[0.69159334897995, 0.48753976821899414, 0.7804230451583862, 0.5539296865463257]],
[[0.5818965435028076, 0.343869686126709, 0.8541288375854492, 0.6288187503814697],
[0.636405348777771, 0.6720571517944336, 0.7466434240341187, 0.7985518574714661]],
[[0.65436190366745, 0.47322067618370056, 0.9061073660850525, 0.6343377828598022]],
[[0.7395644187927246, 0.6922436356544495, 0.9913792610168457, 1.0],
[0.7860392928123474, 0.44102346897125244, 0.8941574096679688, 0.637432873249054]]])
def convertGT(x):
out = []
for i in range(x.nrows()):
out.append(x[i].to_tensor())
return out
#runs fine
convertGT(raggedTensor)
[<tf.Tensor: shape=(2, 4), dtype=float32, numpy=
array([[0.7688891 , 0.39792085, 0.9807137 , 0.5825484 ],
[0.69159335, 0.48753977, 0.78042305, 0.5539297 ]], dtype=float32)>,
<tf.Tensor: shape=(2, 4), dtype=float32, numpy=
array([[0.58189654, 0.3438697 , 0.85412884, 0.62881875],
[0.63640535, 0.67205715, 0.7466434 , 0.79855186]], dtype=float32)>,
<tf.Tensor: shape=(1, 4), dtype=float32, numpy=array([[0.6543619 , 0.47322068, 0.90610737, 0.6343378 ]], dtype=float32)>,
<tf.Tensor: shape=(2, 4), dtype=float32, numpy=
array([[0.7395644 , 0.69224364, 0.99137926, 1. ],
[0.7860393 , 0.44102347, 0.8941574 , 0.6374329 ]], dtype=float32)>]
#tf.function
def convertGT(x):
out = []
for i in range(x.nrows()):
out.append(x[i].to_tensor())
return out
#this will throw the error
convertGT(raggedTensor)
InaccessibleTensorError: The tensor 'Tensor("while/RaggedToTensor/RaggedTensorToTensor:0", shape=(None, None), dtype=float32)' cannot be accessed here: it is defined in another function or code block. Use return values, explicit Python locals or TensorFlow collections to access it. Defined in: FuncGraph(name=while_body_4893, id=140025401503696); accessed from: FuncGraph(name=convertGT, id=140025402568040).
I don't think it is possible to work with lists within tf.function: python side effects (like appending to lists) only happen the first time you call a Function with a set of inputs. Afterwards, the traced tf.Graph is reexecuted, without executing the Python code.
See here: https://www.tensorflow.org/guide/function#python_side_effects
I managed to come up with a solution, here it is:
#tf.function
def unstackRaggedBBox(ragged):
#convert the ragged to a zero padded tensor
_tensor = ragged.to_tensor()
ret = []
for i in range(_tensor.shape[0]):
#create a mask/remove the 0 values
mask = tf.cast(_tensor[i], dtype=tf.bool)
row = tf.boolean_mask(_tensor[i], mask, axis=0)
#reshape the tensor to Nx4
size = tf.cast(tf.shape(row, out_type=tf.dtypes.int32) / 4, tf.int32)
row = tf.reshape(row, [size[0], 4])
ret.append(row)
return ret
I think the ragged_tensor slice operation caused the problem, lists are working fine in graphs.

How to create mini-batches using tensorflow.data.experimental.CsvDataset compatible with model's input shape?

I'm going to train mini-batch by using tensorflow.data.experimental.CsvDataset in TensorFlow 2. But Tensor's shape doesn't fit to my model's input shape.
Please let me know what is the best way to mini-batch training by a dataset of TensorFlow.
I tried as follows:
# I have a dataset with 4 features and 1 label
feature = tf.data.experimental.CsvDataset(['C:/data/iris_0.csv'], record_defaults=[.0] * 4, header=True, select_cols=[0,1,2,3])
label = tf.data.experimental.CsvDataset(['C:/data/iris_0.csv'], record_defaults=[.0] * 1, header=True, select_cols=[4])
dataset = tf.data.Dataset.zip((feature, label))
# and I try to minibatch training:
model = tf.keras.Sequential([tf.keras.layers.Dense(1, input_shape=(4,))])
model.compile(loss='mse', optimizer='sgd')
model.fit(dataset.repeat(1).batch(3), epochs=1)
I got an error:
ValueError: Error when checking input: expected dense_6_input to have
shape (4,) but got array with shape (1,)
Because of : CsvDataset() returns the a tensor of shape (features, batch), but I need it to be of shape (batch, features).
Reference code:
for feature, label in dataset.repeat(1).batch(3).take(1):
print(feature)
# (<tf.Tensor: id=487, shape=(3,), dtype=float32, numpy=array([5.1, 4.9, 4.7], dtype=float32)>, <tf.Tensor: id=488, shape=(3,), dtype=float32, numpy=array([3.5, 3. , 3.2], dtype=float32)>, <tf.Tensor: id=489, shape=(3,), dtype=float32, numpy=array([1.4, 1.4, 1.3], dtype=float32)>, <tf.Tensor: id=490, shape=(3,), dtype=float32, numpy=array([0.2, 0.2, 0.2], dtype=float32)>)
The tf.data.experimental.CsvDataset creates a dataset where each element of the dataset correponds to a row in the CSV file and consists of multiple tensors, i.e. a separate tensor for each column. Therefore, first you need to use map method of dataset to stack all of these tensors into a single tensor so as it would be compatible with the input shape expected by the model:
def map_func(features, label):
return tf.stack(features, axis=1), tf.stack(label, axis=1)
dataset = dataset.map(map_func).batch(BATCH_SIZE)

Model considering a single tensor as batch input

I've coded a custom TensorFlow model. However, when I pass in single tensor to it, it considers each element of that tensor as single input and therefore, gives a batch output.
For eg. my input tensor is of the shape [3,1] with values:
tf.Tensor(
[[0.7001484 ]
[0.2581525 ]
[0.04169908]], shape=(3, 1), dtype=float32)
Corresponding to this I should get a single vector of shape (3,). But what I'm getting is 3x3 tensor:
tf.Tensor(
[[0.31234854 0.3224371 0.36521438]
[0.32561225 0.3294511 0.3449366 ]
[0.33208787 0.33271718 0.33519495]], shape=(3, 3), dtype=float32)
My Model
class MAE_Model(tf.keras.Model):
def __init__(self):
super(MAE_Model, self).__init__()
self.h_fin = EnsembleBlock()
self.ipt = tf.keras.layers.InputLayer(input_shape=(3,1), batch_size=None)
self.fc_1 = tf.keras.layers.Dense(16, activation='relu')
self.fc_2 = tf.keras.layers.Dense(16, activation='relu')
self.classifier = tf.keras.layers.Dense(3, activation='softmax')
def call(self, inputs):
x = self.h_fin(inputs) # this returns a vector of shape [3] E.g. x = [1., 2., 3.]
x = tf.reshape(x, (3,1))
print(x)
x = self.ipt(x)
x = self.fc_1(x)
x = self.fc_2(x)
return self.classifier(x)
Here, print(x) prints a (3,1) tensor, similar to the example above. Is there any way to solve this? I want the model to consider the x as a single input (whole of it) and not as some batch input
The output of your model makes sense. The reason it gives a 3x3 tensor is because the last layer in your model outputs probabilities corresponding to 3 classes. So, given a batch of 3 inputs, your output will have probabilities for 3 classes for each element in your batch. If you want to find the predicted class, you want the classes with the highest probability. Tensorflow allows you to easily find the classes by doing
tf.argmax(predictions, axis=-1)
The corresponding output shape from this operation will be of shape (None,) where None is the batch size.

Keras get_weights() does not return all weights

I have the following NN:
cc = Input(shape=(3,))
dd = Dense(1,activation='tanh')(cc)
dense_model3 = Model(inputs=cc, outputs=dd)
# Compile
dense_model3.compile(optimizer='adam', loss='mean_squared_error')
dense_model3.fit(copstage3,y_stage9, batch_size=150, epochs=100)
ypredi3 = dense_model3.predict(copstage3,batch_size=150, steps = None)
and when I use dense_model3.get_weights() ,I get :
([array([[0.15411839],
[1.072346 ],
[0.37893268]], dtype=float32), array([-0.13432428], dtype=float32)]
However ,as I have 150 rows in my data ,I would expect 150 different weights, representing each row. What am I missing?
Your model has input of size 3,
cc = Input(shape=(3,))
And output of size 1,
dd = Dense(1,activation='tanh')(cc)
There are no intermediate layers. So weights are associated with three inputs and one output as given.
([array([[0.15411839],
[1.072346 ],
[0.37893268]], dtype=float32), array([-0.13432428], dtype=float32)]
Where
[array([[0.15411839], [1.072346 ], [0.37893268]], dtype=float32)
represents weights of input layer of size three and
array([-0.13432428], dtype=float32)
represents weights of output layer of size one.
150 rows of data is used to train this layer and after training, the weights are associated to each individual neuron or node.
Hope this helps.

Reset weights in Keras layer

I'd like to reset (randomize) the weights of all layers in my Keras (deep learning) model. The reason is that I want to be able to train the model several times with different data splits without having to do the (slow) model recompilation every time.
Inspired by this discussion, I'm trying the following code:
# Reset weights
for layer in KModel.layers:
if hasattr(layer,'init'):
input_dim = layer.input_shape[1]
new_weights = layer.init((input_dim, layer.output_dim),name='{}_W'.format(layer.name))
layer.trainable_weights[0].set_value(new_weights.get_value())
However, it only partly works.
Partly, becuase I've inspected some layer.get_weights() values, and they seem to change. But when I restart the training, the cost values are much lower than the initial cost values on the first run. It's almost like I've succeeded resetting some of the weights, but not all of them.
Save the initial weights right after compiling the model but before training it:
model.save_weights('model.h5')
and then after training, "reset" the model by reloading the initial weights:
model.load_weights('model.h5')
This gives you an apples to apples model to compare different data sets and should be quicker than recompiling the entire model.
Reset all layers by checking for initializers:
def reset_weights(model):
import keras.backend as K
session = K.get_session()
for layer in model.layers:
if hasattr(layer, 'kernel_initializer'):
layer.kernel.initializer.run(session=session)
if hasattr(layer, 'bias_initializer'):
layer.bias.initializer.run(session=session)
Update: kernel_initializer is kernel.initializer now.
If you want to truly re-randomize the weights, and not merely restore the initial weights, you can do the following. The code is slightly different depending on whether you're using TensorFlow or Theano.
from keras.initializers import glorot_uniform # Or your initializer of choice
import keras.backend as K
initial_weights = model.get_weights()
backend_name = K.backend()
if backend_name == 'tensorflow':
k_eval = lambda placeholder: placeholder.eval(session=K.get_session())
elif backend_name == 'theano':
k_eval = lambda placeholder: placeholder.eval()
else:
raise ValueError("Unsupported backend")
new_weights = [k_eval(glorot_uniform()(w.shape)) for w in initial_weights]
model.set_weights(new_weights)
I have found the clone_model function that creates a cloned network with the same architecture but new model weights.
Example of use:
model_cloned = tensorflow.keras.models.clone_model(model_base)
Comparing the weights:
original_weights = model_base.get_weights()
print("Original weights", original_weights)
print("========================================================")
print("========================================================")
print("========================================================")
model_cloned = tensorflow.keras.models.clone_model(model_base)
new_weights = model_cloned.get_weights()
print("New weights", new_weights)
If you execute this code several times, you will notice that the cloned model receives new weights each time.
Tensorflow 2 answer:
for ix, layer in enumerate(model.layers):
if hasattr(model.layers[ix], 'kernel_initializer') and \
hasattr(model.layers[ix], 'bias_initializer'):
weight_initializer = model.layers[ix].kernel_initializer
bias_initializer = model.layers[ix].bias_initializer
old_weights, old_biases = model.layers[ix].get_weights()
model.layers[ix].set_weights([
weight_initializer(shape=old_weights.shape),
bias_initializer(shape=old_biases.shape)])
Original weights:
model.layers[1].get_weights()[0][0]
array([ 0.4450057 , -0.13564804, 0.35884023, 0.41411972, 0.24866664,
0.07641453, 0.45726687, -0.04410008, 0.33194816, -0.1965386 ,
-0.38438258, -0.13263905, -0.23807487, 0.40130925, -0.07339832,
0.20535922], dtype=float32)
New weights:
model.layers[1].get_weights()[0][0]
array([-0.4607593 , -0.13104361, -0.0372932 , -0.34242013, 0.12066692,
-0.39146423, 0.3247317 , 0.2635846 , -0.10496247, -0.40134245,
0.19276887, 0.2652442 , -0.18802321, -0.18488845, 0.0826562 ,
-0.23322225], dtype=float32)
K.get_session().close()
K.set_session(tf.Session())
K.get_session().run(tf.global_variables_initializer())
Try set_weights.
for example:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
from __future__ import print_function
import numpy as np
np.random.seed(1234)
from keras.layers import Input
from keras.layers.convolutional import Convolution2D
from keras.models import Model
print("Building Model...")
inp = Input(shape=(1,None,None))
x = Convolution2D(1, 3, 3, border_mode='same', init='normal',bias=False)(inp)
output = Convolution2D(1, 3, 3, border_mode='same', init='normal',bias=False)(x)
model_network = Model(input=inp, output=output)
w = np.asarray([
[[[
[0,0,0],
[0,2,0],
[0,0,0]
]]]
])
for layer_i in range(len(model_network.layers)):
print (model_network.layers[layer_i])
for layer_i in range(1,len(model_network.layers)):
model_network.layers[layer_i].set_weights(w)
input_mat = np.asarray([
[[
[1.,2.,3.,10.],
[4.,5.,6.,11.],
[7.,8.,9.,12.]
]]
])
print("Input:")
print(input_mat)
print("Output:")
print(model_network.predict(input_mat))
w2 = np.asarray([
[[[
[0,0,0],
[0,3,0],
[0,0,0]
]]]
])
for layer_i in range(1,len(model_network.layers)):
model_network.layers[layer_i].set_weights(w2)
print("Output:")
print(model_network.predict(input_mat))
build a model with say, two convolutional layers
print("Building Model...")
inp = Input(shape=(1,None,None))
x = Convolution2D(1, 3, 3, border_mode='same', init='normal',bias=False)(inp)
output = Convolution2D(1, 3, 3, border_mode='same', init='normal',bias=False)(x)
model_network = Model(input=inp, output=output)
then define your weights (i'm using a simple w, but you could use np.random.uniform or anything like that if you want)
w = np.asarray([
[[[
[0,0,0],
[0,2,0],
[0,0,0]
]]]
])
Take a peek at what are the layers inside a model
for layer_i in range(len(model_network.layers)):
print (model_network.layers[layer_i])
Set each weight for each convolutional layer (you'll see that the first layer is actually input and you don't want to change that, that's why the range starts from 1 not zero).
for layer_i in range(1,len(model_network.layers)):
model_network.layers[layer_i].set_weights(w)
Generate some input for your test and predict the output from your model
input_mat = np.asarray([
[[
[1.,2.,3.,10.],
[4.,5.,6.,11.],
[7.,8.,9.,12.]
]]
])
print("Output:")
print(model_network.predict(input_mat))
You could change it again if you want and check again for the output:
w2 = np.asarray([
[[[
[0,0,0],
[0,3,0],
[0,0,0]
]]]
])
for layer_i in range(1,len(model_network.layers)):
model_network.layers[layer_i].set_weights(w2)
print("Output:")
print(model_network.predict(input_mat))
Sample output:
Using Theano backend.
Building Model...
<keras.engine.topology.InputLayer object at 0x7fc0c619fd50>
<keras.layers.convolutional.Convolution2D object at 0x7fc0c6166250>
<keras.layers.convolutional.Convolution2D object at 0x7fc0c6150a10>
Weights after change:
[array([[[[ 0., 0., 0.],
[ 0., 2., 0.],
[ 0., 0., 0.]]]], dtype=float32)]
Input:
[[[[ 1. 2. 3. 10.]
[ 4. 5. 6. 11.]
[ 7. 8. 9. 12.]]]]
Output:
[[[[ 4. 8. 12. 40.]
[ 16. 20. 24. 44.]
[ 28. 32. 36. 48.]]]]
Output:
[[[[ 9. 18. 27. 90.]
[ 36. 45. 54. 99.]
[ 63. 72. 81. 108.]]]]
From your peek at .layers you can see that the first layer is input and the others your convolutional layers.
For tf2 the simplest way to actually reset weights would be:
tf_model.set_weights(
clone_model(tf_model).get_weights()
)
clone_model() as mentioned by #danielsaromo returns new model with trainable params initialized from scratch, we use its weights to reinitialize our model thus no model compilation (knowledge about its loss or optimizer) is needed.
There are two caveats though, first is mentioned in clone_model()'s documentation:
clone_model will not preserve the uniqueness of shared objects within the model (e.g. a single variable attached to two distinct layers will be restored as two separate variables).
Another caveat is that for large models cloning might fail due to memory limit.
To "random" re-initialize weights of a compiled untrained model in TF 2.0 (tf.keras):
weights = [glorot_uniform(seed=random.randint(0, 1000))(w.shape) if w.ndim > 1 else w for w in model.get_weights()]
Note the "if wdim > 1 else w". You don't want to re-initialize the biases (they stay 0 or 1).
use keras.backend.clear_session()

Categories

Resources