Related
I'm trying to recreate a transformer written in Pytorch and implement it in Tensorflow. The problem is that despite both the documentation for the Pytorch version and Tensorflow version, they still come out pretty differently.
I wrote a little code snippet to show the issue:
import torch
import tensorflow as tf
import numpy as np
class TransformerLayer(tf.Module):
def __init__(self, d_model, nhead, dropout=0):
super(TransformerLayer, self).__init__()
self.self_attn = torch.nn.MultiheadAttention(d_model, nhead, dropout=dropout)
batch_size = 2
seq_length = 5
d_model = 10
src = np.random.uniform(size=(batch_size, seq_length, d_model))
srcTF = tf.convert_to_tensor(src)
srcPT = torch.Tensor(src.reshape((seq_length, batch_size, d_model)))
self_attnTF = tf.keras.layers.MultiHeadAttention(key_dim=10, num_heads=5, dropout=0)
transformer_encoder = TransformerLayer(d_model=10, nhead=5, dropout=0.0)
output, scores = self_attnTF(srcTF, srcTF, srcTF, return_attention_scores=True)
print("Tensorflow Attendtion outputs:", output)
print("Tensorflow (averaged) weights:", tf.math.reduce_mean(scores, 1))
print("Torch Attendtion outputs:", transformer_encoder.self_attn(srcPT,srcPT,srcPT)[0])
print("Torch attention output weights:", transformer_encoder.self_attn(srcPT,srcPT,srcPT)[1])
and the result is:
Tensorflow Attendtion outputs: tf.Tensor(
[[[ 0.02602757 -0.14134401 0.00855263 0.4735083 -0.01851891
-0.20382246 -0.18152176 -0.21076852 0.08623976 -0.33548725]
[ 0.02607442 -0.1403394 0.00814065 0.47415024 -0.01882939
-0.20353754 -0.18291879 -0.21234266 0.08595885 -0.33613583]
[ 0.02524654 -0.14096384 0.00870436 0.47411725 -0.01800703
-0.20486829 -0.18163288 -0.21082559 0.08571021 -0.3362339 ]
[ 0.02518575 -0.14039244 0.0090138 0.47431853 -0.01775141
-0.20391947 -0.18138805 -0.2118245 0.08432849 -0.33521986]
[ 0.02556361 -0.14039293 0.00876258 0.4746476 -0.01891363
-0.20398234 -0.18229616 -0.21147579 0.08555281 -0.33639923]]
[[ 0.07844199 -0.1614371 0.01649148 0.5287745 0.05126739
-0.13851154 -0.09829871 -0.1621251 0.01922669 -0.2428589 ]
[ 0.07844222 -0.16024739 0.01805423 0.52941847 0.04975721
-0.13537636 -0.09829231 -0.16129729 0.01979005 -0.24491176]
[ 0.07800542 -0.160701 0.01677295 0.52902794 0.05082911
-0.13843337 -0.09805533 -0.16165744 0.01928401 -0.24327613]
[ 0.07815789 -0.1600025 0.01757433 0.5291927 0.05032986
-0.1368022 -0.09849522 -0.16172451 0.01929555 -0.24438493]
[ 0.0781548 -0.16028519 0.01764914 0.52846324 0.04941286
-0.13746066 -0.09787872 -0.16141161 0.01994199 -0.2440269 ]]], shape=(2, 5, 10), dtype=float32)
Tensorflow (averaged) weights: tf.Tensor(
[[[0.199085 0.20275716 0.20086522 0.19873264 0.19856 ]
[0.2015336 0.19960018 0.20218948 0.19891861 0.19775811]
[0.19906266 0.20318432 0.20190334 0.19812575 0.19772394]
[0.20074987 0.20104568 0.20269363 0.19744729 0.19806348]
[0.19953248 0.20176074 0.20314851 0.19782843 0.19772986]]
[[0.2010009 0.20053487 0.20004745 0.20092985 0.19748697]
[0.20034568 0.20035927 0.19955876 0.20062163 0.19911464]
[0.19967113 0.2006859 0.20012529 0.20047483 0.19904283]
[0.20132652 0.19996871 0.20019794 0.20008174 0.19842513]
[0.2006393 0.20000939 0.19938737 0.20054278 0.19942114]]], shape=(2, 5, 5), dtype=float32)
Torch Attendtion outputs: tensor([[[ 0.1097, -0.4467, -0.0719, -0.1779, -0.0766, -0.1247, 0.1557,
0.0051, -0.3932, -0.1323],
[ 0.1264, -0.3822, 0.0759, -0.0335, -0.1084, -0.1539, 0.1475,
-0.0272, -0.4235, -0.1744]],
[[ 0.1122, -0.4502, -0.0747, -0.1796, -0.0756, -0.1271, 0.1581,
0.0049, -0.3964, -0.1340],
[ 0.1274, -0.3823, 0.0754, -0.0356, -0.1091, -0.1547, 0.1477,
-0.0272, -0.4252, -0.1752]],
[[ 0.1089, -0.4427, -0.0728, -0.1746, -0.0756, -0.1202, 0.1501,
0.0031, -0.3894, -0.1242],
[ 0.1263, -0.3820, 0.0718, -0.0374, -0.1063, -0.1562, 0.1485,
-0.0271, -0.4233, -0.1761]],
[[ 0.1061, -0.4369, -0.0685, -0.1696, -0.0772, -0.1173, 0.1454,
0.0012, -0.3860, -0.1201],
[ 0.1265, -0.3820, 0.0762, -0.0325, -0.1082, -0.1560, 0.1501,
-0.0271, -0.4249, -0.1779]],
[[ 0.1043, -0.4402, -0.0705, -0.1719, -0.0791, -0.1205, 0.1508,
0.0018, -0.3895, -0.1262],
[ 0.1260, -0.3805, 0.0775, -0.0298, -0.1083, -0.1547, 0.1494,
-0.0276, -0.4242, -0.1768]]], grad_fn=<AddBackward0>)
Torch attention output weights: tensor([[[0.2082, 0.2054, 0.1877, 0.1956, 0.2031],
[0.2100, 0.2079, 0.1841, 0.1943, 0.2037],
[0.2007, 0.1995, 0.1929, 0.1999, 0.2070],
[0.1995, 0.1950, 0.1976, 0.2002, 0.2077],
[0.1989, 0.1969, 0.1970, 0.2024, 0.2048]],
[[0.2095, 0.1902, 0.1987, 0.2027, 0.1989],
[0.2090, 0.1956, 0.1997, 0.2004, 0.1952],
[0.2047, 0.1869, 0.2006, 0.2121, 0.1957],
[0.2073, 0.1953, 0.1982, 0.2014, 0.1978],
[0.2089, 0.2003, 0.1953, 0.1957, 0.1998]]], grad_fn=<DivBackward0>)
The output weights look similar but the base attention outputs are way off. Is there any way to make the Tensorflow model come out more like the Pytorch one? Any help would be greatly appreciated!
In MultiHeadAttention there is also a projection layer, like
Q = W_q # input_query + b_q
K = W_k # input_keys + b_k
V = W_v # input_values + b_v
Matrices W_q, W_k and W_v and biases b_q, b_k, b_v are initialized randomly, so difference in outputs should be expected (even between outputs of two distinct layers in pytorch on same input). After self-attention operation there is one more projection and it's also initialized randomly. Weights can be set manually in tensorflow by calling method set_weights of self_attnTF.
Correspondence between weights in tf.keras.layers.MultiHeadAttention and nn.MultiheadAttention not so clear, as an example: torch shares weights between heads, while tf keeps them unique. So if you are using weights of pretrained model from pytorch and try to put them in tensorflow model (for whatever reason) it'll certainly take more than five minutes.
Results should be the same if after initializing pytorch model and tensorflow model you step through their parameters and assign them identical values.
I need to export the formed tensor for which I used this code:
import tensorflow as tf
from transformers import BertTokenizer, TFBertModel
def get_embeddings(model_name,tokenizer,name,inp):
tokenizer = tokenizer.from_pretrained(name)
model = model_name.from_pretrained(name)
input_ids = tf.constant(tokenizer.encode(inp))[None, :] # Batch size 1
outputs = model(input_ids)
last_hidden_states = outputs[0]
cls_token=last_hidden_states[0]
return cls_token
cls_token=get_embeddings(TFBertModel,BertTokenizer,'bert-base-uncased',z[0])
cls_token
I received the following output:
Some layers from the model checkpoint at bert-base-uncased were not used when initializing TFBertModel: ['nsp___cls', 'mlm___cls']
- This IS expected if you are initializing TFBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
All the layers of TFBertModel were initialized from the model checkpoint at bert-base-uncased.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFBertModel for predictions without further training.
<tf.Tensor: shape=(21, 768), dtype=float32, numpy=
array([[-0.24550161, -0.34956726, 0.01089635, ..., -0.38017362,
-0.03965453, 0.41104677],
[ 0.4436314 , -0.3720695 , 0.27837285, ..., 0.7340785 ,
-0.02534109, -0.24059379],
[ 0.00089747, -0.18920937, 0.83858776, ..., 0.58318835,
-0.03517005, -0.29172006],
...,
[-0.06368805, 0.00210648, 0.52235216, ..., 0.32049924,
-0.06555019, 0.20605275],
[-0.10185663, -0.53307414, -0.37091127, ..., -0.17225765,
-0.45891476, 0.30040386],
[ 0.59691334, 0.12757768, -0.27682877, ..., -0.07072508,
-0.6099813 , -0.00861905]], dtype=float32)>
I want to export/view the full tensor either in array or CSV format, preferably the latter.
Actions speak louder than words so here is the MWE:
import tensorflow as tf
n=17
a = tf.random.uniform(shape=[n], dtype=tf.float32)
print(a)
print(tf.sort(a))
When n<=16 it sorts the values just fine, but when n>16 it sorts the list and then turns the values at position 16 and higher to have value = -0. Example output:
tf.Tensor(
[0.41191268 0.48915362 0.65293264 0.6125376 0.00088847 0.03644979
0.13768506 0.528106 0.27231824 0.4003389 0.5799836 0.83420205
0.06494105 0.39109504 0.8135816 0.153288 0.07945895], shape=(17,), dtype=float32)
tf.Tensor(
[ 0.00088847 0.03644979 0.06494105 0.07945895 0.13768506 0.153288
0.27231824 0.39109504 0.4003389 0.41191268 0.48915362 0.528106
0.5799836 0.6125376 0.65293264 0.8135816 -0. ], shape=(17,), dtype=float32)
But when I make the dtype tf.float64 there is no problem for seemingly arbitrary n:
tf.Tensor(
[0.91347295 0.60086058 0.0271204 0.83564393 0.49664206 0.96215479
0.60472639 0.64395121 0.58394402 0.93489432 0.50379539 0.14087138
0.51662724 0.29758834 0.5657154 0.08638131 0.47912787], shape=(17,), dtype=float64)
tf.Tensor(
[0.0271204 0.08638131 0.14087138 0.29758834 0.47912787 0.49664206
0.50379539 0.51662724 0.5657154 0.58394402 0.60086058 0.60472639
0.64395121 0.83564393 0.91347295 0.93489432 0.96215479], shape=(17,), dtype=float64)
Not sure if this is a bug or expected behavior. It does not depend on eager execution. I was noticing issues when using the tensorflowprobability percentile function where it was giving me -0.0 as the value, so I made my own percentile function and observed the same issue (which I suspect is due to this underlying issue with tf.sort). Numpy sorting works fine regardless of the data type, but I was trying to keep things within tensorflow.
Any reason why this might be happening or should I make a bug report?
Hardware: I am on an M1 Macbook Air using tensorflow 2.5.0
I also checked the same with tensorflow==2.5.0 as you mentioned and did not notice any issue.
import tensorflow as tf
print(tf.__version__)
Output:
2.5.0
and then
import tensorflow as tf
n=17
a = tf.random.uniform(shape=[n], dtype=tf.float32)
print(a)
print(tf.sort(a))
Output:
tf.Tensor(
[0.7946081 0.84397626 0.04671419 0.276353 0.8124876 0.66761124
0.21016991 0.28140187 0.22393394 0.20382321 0.667526 0.1714747
0.9672215 0.17870915 0.9914366 0.32059753 0.5422765 ], shape=(17,), dtype=float32)
tf.Tensor(
[0.04671419 0.1714747 0.17870915 0.20382321 0.21016991 0.22393394
0.276353 0.28140187 0.32059753 0.5422765 0.667526 0.66761124
0.7946081 0.8124876 0.84397626 0.9672215 0.9914366 ], shape=(17,), dtype=float32)
with n>16:
import tensorflow as tf
n=18
a = tf.random.uniform(shape=[n], dtype=tf.float32)
print(a)
print(tf.sort(a))
Output:
tf.Tensor(
[0.1922586 0.6136733 0.7517139 0.3762852 0.52895963 0.7804493
0.9869323 0.08194113 0.3963052 0.6049119 0.9553219 0.18031311
0.58210933 0.92059183 0.4442644 0.91004515 0.4451145 0.8300687 ], shape=(18,), dtype=float32)
tf.Tensor(
[0.08194113 0.18031311 0.1922586 0.3762852 0.3963052 0.4442644
0.4451145 0.52895963 0.58210933 0.6049119 0.6136733 0.7517139
0.7804493 0.8300687 0.91004515 0.92059183 0.9553219 0.9869323 ], shape=(18,), dtype=float32)
Would you mind checking it back again and let us know if the issue still persists?
I am trying to convert my tensorflow model (2.0) into tensorflow lite format. My model has two input layers as follows:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.models import load_model
from tensorflow.keras.layers import Lambda, Input, add, Dot, multiply, dot
from tensorflow.keras.backend import dot, transpose, expand_dims
from tensorflow.keras.models import Model
r1 = Input(shape=[None, 1, 512], name='flatbuffer_data') # I want to take a variable amount of
# 512 float embeddings from my flatbuffer, if the flatbuffer has 4, embeddings then it would
# be inferred as shape=[4, 1, 512], if it has a 100 embeddings, then it is [100, 1, 512].
r2 = Input(shape=[1, 512], name='query_embedding')
#Example code
minus_r1 = Lambda(lambda x: -x, name='invert_value')(r1)
subtracted = add([r2, minus_r1], name='embeddings_diff')
out1 = tf.argsort(subtracted)
out2 = tf.sort(subtracted)
model = Model([r1, r2], [out1, out2])
I am then doing some tensor operations on the layers and saving the models as follows (there is no training and hence no trainable parameters, just some linear algebra ops which I want to port to android)
model.save('combined_model.h5')
I get my tensorflow .h5 model , thus but then when I try to convert it into, tensorflow lite, I get the following error:
import tensorflow as tf
model = tf.keras.models.load_model('combined_model.h5')
converter = tf.lite.TFLiteConverter.from_keras_model(model)
tflite_model = converter.convert()
#Error
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/aspiring1/.virtualenvs/faiss/lib/python3.6/site-packages/tensorflow_core/lite/python/lite.py", line 446, in convert
"invalid shape '{1}'.".format(_get_tensor_name(tensor), shape_list))
ValueError: None is only supported in the 1st dimension. Tensor 'flatbuffer_data' has invalid shape '[None, None, 1, 512]'.
I know that we had dynamic and static shape inference in tensorflow 1.x using tensorflow placeholders. Is there an analogue here in tensorflow 2.x. Also, I'd appreciate a solution in tensorflow 1.x too.
Some answers and blogs I've read that might help:
Tensorflow: how to save/restore a model?
Understand dynamic and static shape in tensorflow
Understanding tensorflow shapes
Using the first link above I also tried creating a tensorflow 1.x graph and tried saving it using the saved model format, but I don't get the desired results.
You can find my code for the same here: tensorflow 1.x gist code
Full code: https://drive.google.com/file/d/1MN4-FX_-hz3y-UAuf7OTj_XYuVTlsSTP/view?usp=sharing
Why doesn't this work?
I know that we had dynamic and static shape inference in tensorflow 1.x using tensorflow placeholders. Is there an analogue here in tensorflow 2.x
That all still works fine. I think the problem is that tf.lite doesn't handle dynamic shapes. I think it prealocates all it's tensors, once and re-uses them (I could be wrong).
So, first of all that extra dimension:
[None, None, 1, 512]
keras.Input always includes a batch dimension, which tf.lite can handle being unknown (this restriction seems relaxed in tf-nightly).
But lite seems to prefer a batch dimension of 1. If you switch to:
r1 = Input(shape=[4], batch_size=None, name='flatbuffer_data')
r2 = Input(shape=[4], batch_size=1, name='query_embedding')
Passes the conversion, but still fails when you try to execute the tflite model, because the model wants all unknown dimensions to be 1:
converter = tf.lite.TFLiteConverter.from_keras_model(model)
tflite_model = converter.convert()
i = tf.lite.Interpreter(model_content=tflite_model)
i.allocate_tensors()
i.get_input_details()
i.set_tensor(0, tf.constant([[0.,0,0,0],[1,1,1,1],[2,2,2,2]]))
i.set_tensor(1, tf.constant([[0.,0,0,0]]))
ValueError: Cannot set tensor: Dimension mismatch. Got 3 but expected 1 for dimension 0 of input 0.
With tf-nightly you can convert the model as you've written it, but that also fails to run since the unknown dimension assumed to be 1:
r1 = Input(shape=[None, 4], name='flatbuffer_data')
r2 = Input(shape=[1, 4], name='query_embedding')
...
import tensorflow as tf
converter = tf.lite.TFLiteConverter.from_keras_model(model)
tflite_model = converter.convert()
i = tf.lite.Interpreter(model_content=tflite_model)
i.allocate_tensors()
print(i.get_input_details())
i.set_tensor(0, tf.constant([[[0.,0,0,0],[1,1,1,1],[2,2,2,2]]]))
i.set_tensor(1, tf.constant([[[0.,0,0,0]]]))
ValueError: Cannot set tensor: Dimension mismatch. Got 3 but expected 1 for dimension 1 of input 0.
Solution? No. Almost.
I think you need to give that array a size larger than you expect it to be, and pass an int telling your model how many elements to slice out:
n = Input(shape=(), dtype=tf.int32, name='num_inputs')
r1 = Input(shape=[1000, 4], name='flatbuffer_data')
r2 = Input(shape=[4], name='query_embedding')
#Example code
x = tf.reshape(r1, [1000,4])
x = tf.gather(x, tf.range(tf.squeeze(n)))
minus_r1 = Lambda(lambda x: -x, name='invert_value')(x)
subtracted = add([r2, minus_r1], name='embeddings_diff')
out1 = tf.argsort(subtracted, name='argsort')
out2 = tf.sort(subtracted, name="sorted")
model = Model([r1, r2, n], [out1, out2])
Then it works:
converter = tf.lite.TFLiteConverter.from_keras_model(model)
tflite_model = converter.convert()
i = tf.lite.Interpreter(model_content=tflite_model)
i.allocate_tensors()
for d in i.get_input_details():
print(d)
a = np.zeros([1000, 4], dtype=np.float32)
a[:3] = [
[0.,0,0,0],
[1,1,1,1],
[2,2,2,2]]
i.set_tensor(0, tf.constant(a[np.newaxis,...], dtype=tf.float32))
i.set_tensor(1, tf.constant([[0.,0,0,0]]))
i.set_tensor(2, tf.constant([3], dtype=tf.int32))
i.invoke()
print()
for d in i.get_output_details():
print(i.get_tensor(d['index']))
[[ 0. 0. 0. 0.]
[-1. -1. -1. -1.]
[-2. -2. -2. -2.]]
[[0 1 2 3]
[0 1 2 3]
[0 1 2 3]]
OP tried this in a java interpreter and got:
java.lang.IllegalArgumentException: Internal error: Failed to apply delegate: Attempting to use a delegate that only supports static-sized tensors with a graph that has dynamic-sized tensors.
So we're not quite done yet.
I'd like to reset (randomize) the weights of all layers in my Keras (deep learning) model. The reason is that I want to be able to train the model several times with different data splits without having to do the (slow) model recompilation every time.
Inspired by this discussion, I'm trying the following code:
# Reset weights
for layer in KModel.layers:
if hasattr(layer,'init'):
input_dim = layer.input_shape[1]
new_weights = layer.init((input_dim, layer.output_dim),name='{}_W'.format(layer.name))
layer.trainable_weights[0].set_value(new_weights.get_value())
However, it only partly works.
Partly, becuase I've inspected some layer.get_weights() values, and they seem to change. But when I restart the training, the cost values are much lower than the initial cost values on the first run. It's almost like I've succeeded resetting some of the weights, but not all of them.
Save the initial weights right after compiling the model but before training it:
model.save_weights('model.h5')
and then after training, "reset" the model by reloading the initial weights:
model.load_weights('model.h5')
This gives you an apples to apples model to compare different data sets and should be quicker than recompiling the entire model.
Reset all layers by checking for initializers:
def reset_weights(model):
import keras.backend as K
session = K.get_session()
for layer in model.layers:
if hasattr(layer, 'kernel_initializer'):
layer.kernel.initializer.run(session=session)
if hasattr(layer, 'bias_initializer'):
layer.bias.initializer.run(session=session)
Update: kernel_initializer is kernel.initializer now.
If you want to truly re-randomize the weights, and not merely restore the initial weights, you can do the following. The code is slightly different depending on whether you're using TensorFlow or Theano.
from keras.initializers import glorot_uniform # Or your initializer of choice
import keras.backend as K
initial_weights = model.get_weights()
backend_name = K.backend()
if backend_name == 'tensorflow':
k_eval = lambda placeholder: placeholder.eval(session=K.get_session())
elif backend_name == 'theano':
k_eval = lambda placeholder: placeholder.eval()
else:
raise ValueError("Unsupported backend")
new_weights = [k_eval(glorot_uniform()(w.shape)) for w in initial_weights]
model.set_weights(new_weights)
I have found the clone_model function that creates a cloned network with the same architecture but new model weights.
Example of use:
model_cloned = tensorflow.keras.models.clone_model(model_base)
Comparing the weights:
original_weights = model_base.get_weights()
print("Original weights", original_weights)
print("========================================================")
print("========================================================")
print("========================================================")
model_cloned = tensorflow.keras.models.clone_model(model_base)
new_weights = model_cloned.get_weights()
print("New weights", new_weights)
If you execute this code several times, you will notice that the cloned model receives new weights each time.
Tensorflow 2 answer:
for ix, layer in enumerate(model.layers):
if hasattr(model.layers[ix], 'kernel_initializer') and \
hasattr(model.layers[ix], 'bias_initializer'):
weight_initializer = model.layers[ix].kernel_initializer
bias_initializer = model.layers[ix].bias_initializer
old_weights, old_biases = model.layers[ix].get_weights()
model.layers[ix].set_weights([
weight_initializer(shape=old_weights.shape),
bias_initializer(shape=old_biases.shape)])
Original weights:
model.layers[1].get_weights()[0][0]
array([ 0.4450057 , -0.13564804, 0.35884023, 0.41411972, 0.24866664,
0.07641453, 0.45726687, -0.04410008, 0.33194816, -0.1965386 ,
-0.38438258, -0.13263905, -0.23807487, 0.40130925, -0.07339832,
0.20535922], dtype=float32)
New weights:
model.layers[1].get_weights()[0][0]
array([-0.4607593 , -0.13104361, -0.0372932 , -0.34242013, 0.12066692,
-0.39146423, 0.3247317 , 0.2635846 , -0.10496247, -0.40134245,
0.19276887, 0.2652442 , -0.18802321, -0.18488845, 0.0826562 ,
-0.23322225], dtype=float32)
K.get_session().close()
K.set_session(tf.Session())
K.get_session().run(tf.global_variables_initializer())
Try set_weights.
for example:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
from __future__ import print_function
import numpy as np
np.random.seed(1234)
from keras.layers import Input
from keras.layers.convolutional import Convolution2D
from keras.models import Model
print("Building Model...")
inp = Input(shape=(1,None,None))
x = Convolution2D(1, 3, 3, border_mode='same', init='normal',bias=False)(inp)
output = Convolution2D(1, 3, 3, border_mode='same', init='normal',bias=False)(x)
model_network = Model(input=inp, output=output)
w = np.asarray([
[[[
[0,0,0],
[0,2,0],
[0,0,0]
]]]
])
for layer_i in range(len(model_network.layers)):
print (model_network.layers[layer_i])
for layer_i in range(1,len(model_network.layers)):
model_network.layers[layer_i].set_weights(w)
input_mat = np.asarray([
[[
[1.,2.,3.,10.],
[4.,5.,6.,11.],
[7.,8.,9.,12.]
]]
])
print("Input:")
print(input_mat)
print("Output:")
print(model_network.predict(input_mat))
w2 = np.asarray([
[[[
[0,0,0],
[0,3,0],
[0,0,0]
]]]
])
for layer_i in range(1,len(model_network.layers)):
model_network.layers[layer_i].set_weights(w2)
print("Output:")
print(model_network.predict(input_mat))
build a model with say, two convolutional layers
print("Building Model...")
inp = Input(shape=(1,None,None))
x = Convolution2D(1, 3, 3, border_mode='same', init='normal',bias=False)(inp)
output = Convolution2D(1, 3, 3, border_mode='same', init='normal',bias=False)(x)
model_network = Model(input=inp, output=output)
then define your weights (i'm using a simple w, but you could use np.random.uniform or anything like that if you want)
w = np.asarray([
[[[
[0,0,0],
[0,2,0],
[0,0,0]
]]]
])
Take a peek at what are the layers inside a model
for layer_i in range(len(model_network.layers)):
print (model_network.layers[layer_i])
Set each weight for each convolutional layer (you'll see that the first layer is actually input and you don't want to change that, that's why the range starts from 1 not zero).
for layer_i in range(1,len(model_network.layers)):
model_network.layers[layer_i].set_weights(w)
Generate some input for your test and predict the output from your model
input_mat = np.asarray([
[[
[1.,2.,3.,10.],
[4.,5.,6.,11.],
[7.,8.,9.,12.]
]]
])
print("Output:")
print(model_network.predict(input_mat))
You could change it again if you want and check again for the output:
w2 = np.asarray([
[[[
[0,0,0],
[0,3,0],
[0,0,0]
]]]
])
for layer_i in range(1,len(model_network.layers)):
model_network.layers[layer_i].set_weights(w2)
print("Output:")
print(model_network.predict(input_mat))
Sample output:
Using Theano backend.
Building Model...
<keras.engine.topology.InputLayer object at 0x7fc0c619fd50>
<keras.layers.convolutional.Convolution2D object at 0x7fc0c6166250>
<keras.layers.convolutional.Convolution2D object at 0x7fc0c6150a10>
Weights after change:
[array([[[[ 0., 0., 0.],
[ 0., 2., 0.],
[ 0., 0., 0.]]]], dtype=float32)]
Input:
[[[[ 1. 2. 3. 10.]
[ 4. 5. 6. 11.]
[ 7. 8. 9. 12.]]]]
Output:
[[[[ 4. 8. 12. 40.]
[ 16. 20. 24. 44.]
[ 28. 32. 36. 48.]]]]
Output:
[[[[ 9. 18. 27. 90.]
[ 36. 45. 54. 99.]
[ 63. 72. 81. 108.]]]]
From your peek at .layers you can see that the first layer is input and the others your convolutional layers.
For tf2 the simplest way to actually reset weights would be:
tf_model.set_weights(
clone_model(tf_model).get_weights()
)
clone_model() as mentioned by #danielsaromo returns new model with trainable params initialized from scratch, we use its weights to reinitialize our model thus no model compilation (knowledge about its loss or optimizer) is needed.
There are two caveats though, first is mentioned in clone_model()'s documentation:
clone_model will not preserve the uniqueness of shared objects within the model (e.g. a single variable attached to two distinct layers will be restored as two separate variables).
Another caveat is that for large models cloning might fail due to memory limit.
To "random" re-initialize weights of a compiled untrained model in TF 2.0 (tf.keras):
weights = [glorot_uniform(seed=random.randint(0, 1000))(w.shape) if w.ndim > 1 else w for w in model.get_weights()]
Note the "if wdim > 1 else w". You don't want to re-initialize the biases (they stay 0 or 1).
use keras.backend.clear_session()