Trying to achieve same result with Pytorch and Tensorflow MultiheadAttention

Trying to achieve same result with Pytorch and Tensorflow MultiheadAttention - python

I'm trying to recreate a transformer written in Pytorch and implement it in Tensorflow. The problem is that despite both the documentation for the Pytorch version and Tensorflow version, they still come out pretty differently.
I wrote a little code snippet to show the issue:
import torch
import tensorflow as tf
import numpy as np
class TransformerLayer(tf.Module):
def __init__(self, d_model, nhead, dropout=0):
super(TransformerLayer, self).__init__()
self.self_attn = torch.nn.MultiheadAttention(d_model, nhead, dropout=dropout)
batch_size = 2
seq_length = 5
d_model = 10
src = np.random.uniform(size=(batch_size, seq_length, d_model))
srcTF = tf.convert_to_tensor(src)
srcPT = torch.Tensor(src.reshape((seq_length, batch_size, d_model)))
self_attnTF = tf.keras.layers.MultiHeadAttention(key_dim=10, num_heads=5, dropout=0)
transformer_encoder = TransformerLayer(d_model=10, nhead=5, dropout=0.0)
output, scores = self_attnTF(srcTF, srcTF, srcTF, return_attention_scores=True)
print("Tensorflow Attendtion outputs:", output)
print("Tensorflow (averaged) weights:", tf.math.reduce_mean(scores, 1))
print("Torch Attendtion outputs:", transformer_encoder.self_attn(srcPT,srcPT,srcPT)[0])
print("Torch attention output weights:", transformer_encoder.self_attn(srcPT,srcPT,srcPT)[1])
and the result is:
Tensorflow Attendtion outputs: tf.Tensor(
[[[ 0.02602757 -0.14134401 0.00855263 0.4735083 -0.01851891
-0.20382246 -0.18152176 -0.21076852 0.08623976 -0.33548725]
[ 0.02607442 -0.1403394 0.00814065 0.47415024 -0.01882939
-0.20353754 -0.18291879 -0.21234266 0.08595885 -0.33613583]
[ 0.02524654 -0.14096384 0.00870436 0.47411725 -0.01800703
-0.20486829 -0.18163288 -0.21082559 0.08571021 -0.3362339 ]
[ 0.02518575 -0.14039244 0.0090138 0.47431853 -0.01775141
-0.20391947 -0.18138805 -0.2118245 0.08432849 -0.33521986]
[ 0.02556361 -0.14039293 0.00876258 0.4746476 -0.01891363
-0.20398234 -0.18229616 -0.21147579 0.08555281 -0.33639923]]
[[ 0.07844199 -0.1614371 0.01649148 0.5287745 0.05126739
-0.13851154 -0.09829871 -0.1621251 0.01922669 -0.2428589 ]
[ 0.07844222 -0.16024739 0.01805423 0.52941847 0.04975721
-0.13537636 -0.09829231 -0.16129729 0.01979005 -0.24491176]
[ 0.07800542 -0.160701 0.01677295 0.52902794 0.05082911
-0.13843337 -0.09805533 -0.16165744 0.01928401 -0.24327613]
[ 0.07815789 -0.1600025 0.01757433 0.5291927 0.05032986
-0.1368022 -0.09849522 -0.16172451 0.01929555 -0.24438493]
[ 0.0781548 -0.16028519 0.01764914 0.52846324 0.04941286
-0.13746066 -0.09787872 -0.16141161 0.01994199 -0.2440269 ]]], shape=(2, 5, 10), dtype=float32)
Tensorflow (averaged) weights: tf.Tensor(
[[[0.199085 0.20275716 0.20086522 0.19873264 0.19856 ]
[0.2015336 0.19960018 0.20218948 0.19891861 0.19775811]
[0.19906266 0.20318432 0.20190334 0.19812575 0.19772394]
[0.20074987 0.20104568 0.20269363 0.19744729 0.19806348]
[0.19953248 0.20176074 0.20314851 0.19782843 0.19772986]]
[[0.2010009 0.20053487 0.20004745 0.20092985 0.19748697]
[0.20034568 0.20035927 0.19955876 0.20062163 0.19911464]
[0.19967113 0.2006859 0.20012529 0.20047483 0.19904283]
[0.20132652 0.19996871 0.20019794 0.20008174 0.19842513]
[0.2006393 0.20000939 0.19938737 0.20054278 0.19942114]]], shape=(2, 5, 5), dtype=float32)
Torch Attendtion outputs: tensor([[[ 0.1097, -0.4467, -0.0719, -0.1779, -0.0766, -0.1247, 0.1557,
0.0051, -0.3932, -0.1323],
[ 0.1264, -0.3822, 0.0759, -0.0335, -0.1084, -0.1539, 0.1475,
-0.0272, -0.4235, -0.1744]],
[[ 0.1122, -0.4502, -0.0747, -0.1796, -0.0756, -0.1271, 0.1581,
0.0049, -0.3964, -0.1340],
[ 0.1274, -0.3823, 0.0754, -0.0356, -0.1091, -0.1547, 0.1477,
-0.0272, -0.4252, -0.1752]],
[[ 0.1089, -0.4427, -0.0728, -0.1746, -0.0756, -0.1202, 0.1501,
0.0031, -0.3894, -0.1242],
[ 0.1263, -0.3820, 0.0718, -0.0374, -0.1063, -0.1562, 0.1485,
-0.0271, -0.4233, -0.1761]],
[[ 0.1061, -0.4369, -0.0685, -0.1696, -0.0772, -0.1173, 0.1454,
0.0012, -0.3860, -0.1201],
[ 0.1265, -0.3820, 0.0762, -0.0325, -0.1082, -0.1560, 0.1501,
-0.0271, -0.4249, -0.1779]],
[[ 0.1043, -0.4402, -0.0705, -0.1719, -0.0791, -0.1205, 0.1508,
0.0018, -0.3895, -0.1262],
[ 0.1260, -0.3805, 0.0775, -0.0298, -0.1083, -0.1547, 0.1494,
-0.0276, -0.4242, -0.1768]]], grad_fn=<AddBackward0>)
Torch attention output weights: tensor([[[0.2082, 0.2054, 0.1877, 0.1956, 0.2031],
[0.2100, 0.2079, 0.1841, 0.1943, 0.2037],
[0.2007, 0.1995, 0.1929, 0.1999, 0.2070],
[0.1995, 0.1950, 0.1976, 0.2002, 0.2077],
[0.1989, 0.1969, 0.1970, 0.2024, 0.2048]],
[[0.2095, 0.1902, 0.1987, 0.2027, 0.1989],
[0.2090, 0.1956, 0.1997, 0.2004, 0.1952],
[0.2047, 0.1869, 0.2006, 0.2121, 0.1957],
[0.2073, 0.1953, 0.1982, 0.2014, 0.1978],
[0.2089, 0.2003, 0.1953, 0.1957, 0.1998]]], grad_fn=<DivBackward0>)
The output weights look similar but the base attention outputs are way off. Is there any way to make the Tensorflow model come out more like the Pytorch one? Any help would be greatly appreciated!

In MultiHeadAttention there is also a projection layer, like
Q = W_q # input_query + b_q
K = W_k # input_keys + b_k
V = W_v # input_values + b_v
Matrices W_q, W_k and W_v and biases b_q, b_k, b_v are initialized randomly, so difference in outputs should be expected (even between outputs of two distinct layers in pytorch on same input). After self-attention operation there is one more projection and it's also initialized randomly. Weights can be set manually in tensorflow by calling method set_weights of self_attnTF.
Correspondence between weights in tf.keras.layers.MultiHeadAttention and nn.MultiheadAttention not so clear, as an example: torch shares weights between heads, while tf keeps them unique. So if you are using weights of pretrained model from pytorch and try to put them in tensorflow model (for whatever reason) it'll certainly take more than five minutes.
Results should be the same if after initializing pytorch model and tensorflow model you step through their parameters and assign them identical values.

Related

Exporting/Viewing the whole Tensor (764-d)

I need to export the formed tensor for which I used this code:
import tensorflow as tf
from transformers import BertTokenizer, TFBertModel
def get_embeddings(model_name,tokenizer,name,inp):
tokenizer = tokenizer.from_pretrained(name)
model = model_name.from_pretrained(name)
input_ids = tf.constant(tokenizer.encode(inp))[None, :] # Batch size 1
outputs = model(input_ids)
last_hidden_states = outputs[0]
cls_token=last_hidden_states[0]
return cls_token
cls_token=get_embeddings(TFBertModel,BertTokenizer,'bert-base-uncased',z[0])
cls_token
I received the following output:
Some layers from the model checkpoint at bert-base-uncased were not used when initializing TFBertModel: ['nsp___cls', 'mlm___cls']
- This IS expected if you are initializing TFBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
All the layers of TFBertModel were initialized from the model checkpoint at bert-base-uncased.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFBertModel for predictions without further training.
<tf.Tensor: shape=(21, 768), dtype=float32, numpy=
array([[-0.24550161, -0.34956726, 0.01089635, ..., -0.38017362,
-0.03965453, 0.41104677],
[ 0.4436314 , -0.3720695 , 0.27837285, ..., 0.7340785 ,
-0.02534109, -0.24059379],
[ 0.00089747, -0.18920937, 0.83858776, ..., 0.58318835,
-0.03517005, -0.29172006],
...,
[-0.06368805, 0.00210648, 0.52235216, ..., 0.32049924,
-0.06555019, 0.20605275],
[-0.10185663, -0.53307414, -0.37091127, ..., -0.17225765,
-0.45891476, 0.30040386],
[ 0.59691334, 0.12757768, -0.27682877, ..., -0.07072508,
-0.6099813 , -0.00861905]], dtype=float32)>
I want to export/view the full tensor either in array or CSV format, preferably the latter.

ValueError: None is only supported in the 1st dimension. Tensor 'flatbuffer_data' has invalid shape '[None, None, 1, 512]'

I am trying to convert my tensorflow model (2.0) into tensorflow lite format. My model has two input layers as follows:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.models import load_model
from tensorflow.keras.layers import Lambda, Input, add, Dot, multiply, dot
from tensorflow.keras.backend import dot, transpose, expand_dims
from tensorflow.keras.models import Model
r1 = Input(shape=[None, 1, 512], name='flatbuffer_data') # I want to take a variable amount of
# 512 float embeddings from my flatbuffer, if the flatbuffer has 4, embeddings then it would
# be inferred as shape=[4, 1, 512], if it has a 100 embeddings, then it is [100, 1, 512].
r2 = Input(shape=[1, 512], name='query_embedding')
#Example code
minus_r1 = Lambda(lambda x: -x, name='invert_value')(r1)
subtracted = add([r2, minus_r1], name='embeddings_diff')
out1 = tf.argsort(subtracted)
out2 = tf.sort(subtracted)
model = Model([r1, r2], [out1, out2])
I am then doing some tensor operations on the layers and saving the models as follows (there is no training and hence no trainable parameters, just some linear algebra ops which I want to port to android)
model.save('combined_model.h5')
I get my tensorflow .h5 model , thus but then when I try to convert it into, tensorflow lite, I get the following error:
import tensorflow as tf
model = tf.keras.models.load_model('combined_model.h5')
converter = tf.lite.TFLiteConverter.from_keras_model(model)
tflite_model = converter.convert()
#Error
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/aspiring1/.virtualenvs/faiss/lib/python3.6/site-packages/tensorflow_core/lite/python/lite.py", line 446, in convert
"invalid shape '{1}'.".format(_get_tensor_name(tensor), shape_list))
ValueError: None is only supported in the 1st dimension. Tensor 'flatbuffer_data' has invalid shape '[None, None, 1, 512]'.
I know that we had dynamic and static shape inference in tensorflow 1.x using tensorflow placeholders. Is there an analogue here in tensorflow 2.x. Also, I'd appreciate a solution in tensorflow 1.x too.
Some answers and blogs I've read that might help:
Tensorflow: how to save/restore a model?
Understand dynamic and static shape in tensorflow
Understanding tensorflow shapes
Using the first link above I also tried creating a tensorflow 1.x graph and tried saving it using the saved model format, but I don't get the desired results.
You can find my code for the same here: tensorflow 1.x gist code

Full code: https://drive.google.com/file/d/1MN4-FX_-hz3y-UAuf7OTj_XYuVTlsSTP/view?usp=sharing
Why doesn't this work?
I know that we had dynamic and static shape inference in tensorflow 1.x using tensorflow placeholders. Is there an analogue here in tensorflow 2.x
That all still works fine. I think the problem is that tf.lite doesn't handle dynamic shapes. I think it prealocates all it's tensors, once and re-uses them (I could be wrong).
So, first of all that extra dimension:
[None, None, 1, 512]
keras.Input always includes a batch dimension, which tf.lite can handle being unknown (this restriction seems relaxed in tf-nightly).
But lite seems to prefer a batch dimension of 1. If you switch to:
r1 = Input(shape=[4], batch_size=None, name='flatbuffer_data')
r2 = Input(shape=[4], batch_size=1, name='query_embedding')
Passes the conversion, but still fails when you try to execute the tflite model, because the model wants all unknown dimensions to be 1:
converter = tf.lite.TFLiteConverter.from_keras_model(model)
tflite_model = converter.convert()
i = tf.lite.Interpreter(model_content=tflite_model)
i.allocate_tensors()
i.get_input_details()
i.set_tensor(0, tf.constant([[0.,0,0,0],[1,1,1,1],[2,2,2,2]]))
i.set_tensor(1, tf.constant([[0.,0,0,0]]))
ValueError: Cannot set tensor: Dimension mismatch. Got 3 but expected 1 for dimension 0 of input 0.
With tf-nightly you can convert the model as you've written it, but that also fails to run since the unknown dimension assumed to be 1:
r1 = Input(shape=[None, 4], name='flatbuffer_data')
r2 = Input(shape=[1, 4], name='query_embedding')
...
import tensorflow as tf
converter = tf.lite.TFLiteConverter.from_keras_model(model)
tflite_model = converter.convert()
i = tf.lite.Interpreter(model_content=tflite_model)
i.allocate_tensors()
print(i.get_input_details())
i.set_tensor(0, tf.constant([[[0.,0,0,0],[1,1,1,1],[2,2,2,2]]]))
i.set_tensor(1, tf.constant([[[0.,0,0,0]]]))
ValueError: Cannot set tensor: Dimension mismatch. Got 3 but expected 1 for dimension 1 of input 0.
Solution? No. Almost.
I think you need to give that array a size larger than you expect it to be, and pass an int telling your model how many elements to slice out:
n = Input(shape=(), dtype=tf.int32, name='num_inputs')
r1 = Input(shape=[1000, 4], name='flatbuffer_data')
r2 = Input(shape=[4], name='query_embedding')
#Example code
x = tf.reshape(r1, [1000,4])
x = tf.gather(x, tf.range(tf.squeeze(n)))
minus_r1 = Lambda(lambda x: -x, name='invert_value')(x)
subtracted = add([r2, minus_r1], name='embeddings_diff')
out1 = tf.argsort(subtracted, name='argsort')
out2 = tf.sort(subtracted, name="sorted")
model = Model([r1, r2, n], [out1, out2])
Then it works:
converter = tf.lite.TFLiteConverter.from_keras_model(model)
tflite_model = converter.convert()
i = tf.lite.Interpreter(model_content=tflite_model)
i.allocate_tensors()
for d in i.get_input_details():
print(d)
a = np.zeros([1000, 4], dtype=np.float32)
a[:3] = [
[0.,0,0,0],
[1,1,1,1],
[2,2,2,2]]
i.set_tensor(0, tf.constant(a[np.newaxis,...], dtype=tf.float32))
i.set_tensor(1, tf.constant([[0.,0,0,0]]))
i.set_tensor(2, tf.constant([3], dtype=tf.int32))
i.invoke()
print()
for d in i.get_output_details():
print(i.get_tensor(d['index']))
[[ 0. 0. 0. 0.]
[-1. -1. -1. -1.]
[-2. -2. -2. -2.]]
[[0 1 2 3]
[0 1 2 3]
[0 1 2 3]]
OP tried this in a java interpreter and got:
java.lang.IllegalArgumentException: Internal error: Failed to apply delegate: Attempting to use a delegate that only supports static-sized tensors with a graph that has dynamic-sized tensors.
So we're not quite done yet.

tensorflow lite conversion changes model weights

I have a model saved in .pb file and it works fine, but when I convert it to tflite model with command tflite_convert or python api, the result is wrong.
And I find that the weights have been changed after conversion.
The weights of first convolution layer in .pb file are as follows:
shape -
(3, 3, 1, 8)
Value -
[[[[-0.09953183 0.11656161 0.1101007 -0.02618909 -0.21355744
-0.05877252 0.11881053 -0.17588891]]
[[-0.16565287 0.16550814 0.02200373 0.0987333 0.0194475
-0.12387082 -0.06090429 -0.19122925]]
[[-0.19570269 0.11854213 -0.14988026 -0.01476914 0.12554781
-0.1324673 -0.04035608 -0.05299769]]]
[[[ 0.08548407 -0.09644134 0.24321978 0.15008359 -0.2591259
0.2421266 0.02051029 -0.05138292]]
[[ 0.04847065 -0.22357103 -0.00074622 0.19842042 0.00228794
0.13352048 -0.24048899 -0.00679056]]
[[-0.01857976 -0.09324262 -0.19632849 0.02247559 0.18489467
-0.07365554 -0.39479995 0.0622104 ]]]
[[[ 0.13633308 0.04041797 0.10581032 -0.13119537 0.01122213
0.15191257 0.03097369 0.07342041]]
[[ 0.16241515 -0.04534301 -0.06334146 -0.19276966 -0.03890191
0.08520683 -0.0117504 0.14705475]]
[[ 0.07332639 -0.00533756 -0.06285968 -0.12631118 0.09094885
-0.09658462 -0.04983746 0.13325559]]]]
And The weights of first convolution layer in .tflite file are as follows:
shape -
(1, 3, 3, 8)
Value -
[[[[ -31.412575 42.14294 36.269154 -8.724394
-67.77634 -12.249788 43.18692 -76.762474 ]
[ -52.280594 59.839592 7.248423 32.89111
6.1720166 -25.818043 -22.138346 -83.4574 ]
[ -61.764416 42.858997 -49.373257 -4.9200554
39.844883 -27.609783 -14.669194 -23.129562 ]]
[[ 26.979053 -34.86844 80.12097 49.997475
-82.23831 50.46576 7.4553676 -22.424837 ]
[ 15.297499 -80.832275 -0.24581672 66.09997
0.7261198 27.829294 -87.41631 -2.963576 ]
[ -5.8638325 -33.71194 -64.67414 7.487314
58.679688 -15.351814 -143.50742 27.15023 ]]
[[ 43.02717 14.613146 34.855824 -43.705227
3.561546 31.662704 11.25875 32.04257 ]
[ 51.258755 -16.393799 -20.865818 -64.21752
-12.346229 17.759417 -4.2712016 64.1785 ]
[ 23.14205 -1.929798 -20.70711 -42.07815
28.864273 -20.130857 -18.115618 58.15619 ]]]]
It seems there are some relations.
the tensorflow version is 1.12.
The command is
tflite_convert --output_file=graph_net_half.tflite --graph_def_file=graph_net_half.pb --input_arrays=input_image --output_arrays=output_landmark
there is another similar question with no answer:tflite weights

It's expect that the weights will be changed when running tflite_convert for several reasons:
TensorFlow Conv2D uses HWIO weights (filter_height, filter_weight, input_channels, output_channels). TensorFlow Lite Conv2D uses IHWO weights for optimization reasons. The axis of weights must be reordered.
tflite_convert does optimizations like constant folding, which will also change the weights.
Seeing different weights doesn't mean the conversion is wrong.

Tensorflow linear regression result does not match Numpy/SciKit-Learn

I am working through the Tensorflow examples from Aurelien Geron's "Hands-On Machine Learning" book. However, I am unable to replicate the simple linear regression example in this supporting notebook. Why doesn't Tensorflow match the Numpy/SciKit-Learn result?
As far as I can tell, there is no optimization (we are using the normal equation, so it's just matrix computations), and the answers seem too different to be precision errors.
import numpy as np
import tensorflow as tf
from sklearn.datasets import fetch_california_housing
housing = fetch_california_housing()
m, n = housing.data.shape
housing_data_plus_bias = np.c_[np.ones((m, 1)), housing.data]
X = tf.constant(housing_data_plus_bias, dtype=tf.float32, name="X")
y = tf.constant(housing.target.reshape(-1, 1), dtype=tf.float32, name="y")
XT = tf.transpose(X)
theta = tf.matmul(tf.matmul(tf.matrix_inverse(tf.matmul(XT, X)), XT), y)
with tf.Session() as sess:
theta_value = theta.eval()
theta_value
Answer:
array([[ -3.74651413e+01],
[ 4.35734153e-01],
[ 9.33829229e-03],
[ -1.06622010e-01],
[ 6.44106984e-01],
[ -4.25131839e-06],
[ -3.77322501e-03],
[ -4.26648885e-01],
[ -4.40514028e-01]], dtype=float32)
###### Compare with pure NumPy
X = housing_data_plus_bias
y = housing.target.reshape(-1, 1)
theta_numpy = np.linalg.inv(X.T.dot(X)).dot(X.T).dot(y)
print(theta_numpy)
Answer:
[[ -3.69419202e+01]
[ 4.36693293e-01]
[ 9.43577803e-03]
[ -1.07322041e-01]
[ 6.45065694e-01]
[ -3.97638942e-06]
[ -3.78654265e-03]
[ -4.21314378e-01]
[ -4.34513755e-01]]
###### Compare with Scikit-Learn
from sklearn.linear_model import LinearRegression
lin_reg = LinearRegression()
lin_reg.fit(housing.data, housing.target.reshape(-1, 1))
print(np.r_[lin_reg.intercept_.reshape(-1, 1), lin_reg.coef_.T])
Answer:
[[ -3.69419202e+01]
[ 4.36693293e-01]
[ 9.43577803e-03]
[ -1.07322041e-01]
[ 6.45065694e-01]
[ -3.97638942e-06]
[ -3.78654265e-03]
[ -4.21314378e-01]
[ -4.34513755e-01]]
Update: My question sounded similar to this one, but following the recommendation did not fix the problem.

I just compare the results by tensorflow and numpy. Since you used dtype=tf.float32 for X and y, I will use np.float32 for the numpy example as follows:
X_numpy = housing_data_plus_bias.astype(np.float32)
y_numpy = housing.target.reshape(-1, 1).astype(np.float32)
Now let's try to compare the results by tf.matmul(XT, X) (tensorflow) and X.T.dot(X) (numpy):
with tf.Session() as sess:
XTX_value = tf.matmul(XT, X).eval()
XTX_numpy = X_numpy.T.dot(X_numpy)
np.allclose(XTX_value, XTX_numpy, rtol=1e-06) # True
np.allclose(XTX_value, XTX_numpy, rtol=1e-07) # False
So this is the problem of float's precision. If you change the precision to tf.float64 and np.float64, you will have the same result for theta.

Reset weights in Keras layer

I'd like to reset (randomize) the weights of all layers in my Keras (deep learning) model. The reason is that I want to be able to train the model several times with different data splits without having to do the (slow) model recompilation every time.
Inspired by this discussion, I'm trying the following code:
# Reset weights
for layer in KModel.layers:
if hasattr(layer,'init'):
input_dim = layer.input_shape[1]
new_weights = layer.init((input_dim, layer.output_dim),name='{}_W'.format(layer.name))
layer.trainable_weights[0].set_value(new_weights.get_value())
However, it only partly works.
Partly, becuase I've inspected some layer.get_weights() values, and they seem to change. But when I restart the training, the cost values are much lower than the initial cost values on the first run. It's almost like I've succeeded resetting some of the weights, but not all of them.

Save the initial weights right after compiling the model but before training it:
model.save_weights('model.h5')
and then after training, "reset" the model by reloading the initial weights:
model.load_weights('model.h5')
This gives you an apples to apples model to compare different data sets and should be quicker than recompiling the entire model.

Reset all layers by checking for initializers:
def reset_weights(model):
import keras.backend as K
session = K.get_session()
for layer in model.layers:
if hasattr(layer, 'kernel_initializer'):
layer.kernel.initializer.run(session=session)
if hasattr(layer, 'bias_initializer'):
layer.bias.initializer.run(session=session)
Update: kernel_initializer is kernel.initializer now.

If you want to truly re-randomize the weights, and not merely restore the initial weights, you can do the following. The code is slightly different depending on whether you're using TensorFlow or Theano.
from keras.initializers import glorot_uniform # Or your initializer of choice
import keras.backend as K
initial_weights = model.get_weights()
backend_name = K.backend()
if backend_name == 'tensorflow':
k_eval = lambda placeholder: placeholder.eval(session=K.get_session())
elif backend_name == 'theano':
k_eval = lambda placeholder: placeholder.eval()
else:
raise ValueError("Unsupported backend")
new_weights = [k_eval(glorot_uniform()(w.shape)) for w in initial_weights]
model.set_weights(new_weights)

I have found the clone_model function that creates a cloned network with the same architecture but new model weights.
Example of use:
model_cloned = tensorflow.keras.models.clone_model(model_base)
Comparing the weights:
original_weights = model_base.get_weights()
print("Original weights", original_weights)
print("========================================================")
print("========================================================")
print("========================================================")
model_cloned = tensorflow.keras.models.clone_model(model_base)
new_weights = model_cloned.get_weights()
print("New weights", new_weights)
If you execute this code several times, you will notice that the cloned model receives new weights each time.

Tensorflow 2 answer:
for ix, layer in enumerate(model.layers):
if hasattr(model.layers[ix], 'kernel_initializer') and \
hasattr(model.layers[ix], 'bias_initializer'):
weight_initializer = model.layers[ix].kernel_initializer
bias_initializer = model.layers[ix].bias_initializer
old_weights, old_biases = model.layers[ix].get_weights()
model.layers[ix].set_weights([
weight_initializer(shape=old_weights.shape),
bias_initializer(shape=old_biases.shape)])
Original weights:
model.layers[1].get_weights()[0][0]
array([ 0.4450057 , -0.13564804, 0.35884023, 0.41411972, 0.24866664,
0.07641453, 0.45726687, -0.04410008, 0.33194816, -0.1965386 ,
-0.38438258, -0.13263905, -0.23807487, 0.40130925, -0.07339832,
0.20535922], dtype=float32)
New weights:
model.layers[1].get_weights()[0][0]
array([-0.4607593 , -0.13104361, -0.0372932 , -0.34242013, 0.12066692,
-0.39146423, 0.3247317 , 0.2635846 , -0.10496247, -0.40134245,
0.19276887, 0.2652442 , -0.18802321, -0.18488845, 0.0826562 ,
-0.23322225], dtype=float32)

K.get_session().close()
K.set_session(tf.Session())
K.get_session().run(tf.global_variables_initializer())

Try set_weights.
for example:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
from __future__ import print_function
import numpy as np
np.random.seed(1234)
from keras.layers import Input
from keras.layers.convolutional import Convolution2D
from keras.models import Model
print("Building Model...")
inp = Input(shape=(1,None,None))
x = Convolution2D(1, 3, 3, border_mode='same', init='normal',bias=False)(inp)
output = Convolution2D(1, 3, 3, border_mode='same', init='normal',bias=False)(x)
model_network = Model(input=inp, output=output)
w = np.asarray([
[[[
[0,0,0],
[0,2,0],
[0,0,0]
]]]
])
for layer_i in range(len(model_network.layers)):
print (model_network.layers[layer_i])
for layer_i in range(1,len(model_network.layers)):
model_network.layers[layer_i].set_weights(w)
input_mat = np.asarray([
[[
[1.,2.,3.,10.],
[4.,5.,6.,11.],
[7.,8.,9.,12.]
]]
])
print("Input:")
print(input_mat)
print("Output:")
print(model_network.predict(input_mat))
w2 = np.asarray([
[[[
[0,0,0],
[0,3,0],
[0,0,0]
]]]
])
for layer_i in range(1,len(model_network.layers)):
model_network.layers[layer_i].set_weights(w2)
print("Output:")
print(model_network.predict(input_mat))
build a model with say, two convolutional layers
print("Building Model...")
inp = Input(shape=(1,None,None))
x = Convolution2D(1, 3, 3, border_mode='same', init='normal',bias=False)(inp)
output = Convolution2D(1, 3, 3, border_mode='same', init='normal',bias=False)(x)
model_network = Model(input=inp, output=output)
then define your weights (i'm using a simple w, but you could use np.random.uniform or anything like that if you want)
w = np.asarray([
[[[
[0,0,0],
[0,2,0],
[0,0,0]
]]]
])
Take a peek at what are the layers inside a model
for layer_i in range(len(model_network.layers)):
print (model_network.layers[layer_i])
Set each weight for each convolutional layer (you'll see that the first layer is actually input and you don't want to change that, that's why the range starts from 1 not zero).
for layer_i in range(1,len(model_network.layers)):
model_network.layers[layer_i].set_weights(w)
Generate some input for your test and predict the output from your model
input_mat = np.asarray([
[[
[1.,2.,3.,10.],
[4.,5.,6.,11.],
[7.,8.,9.,12.]
]]
])
print("Output:")
print(model_network.predict(input_mat))
You could change it again if you want and check again for the output:
w2 = np.asarray([
[[[
[0,0,0],
[0,3,0],
[0,0,0]
]]]
])
for layer_i in range(1,len(model_network.layers)):
model_network.layers[layer_i].set_weights(w2)
print("Output:")
print(model_network.predict(input_mat))
Sample output:
Using Theano backend.
Building Model...
<keras.engine.topology.InputLayer object at 0x7fc0c619fd50>
<keras.layers.convolutional.Convolution2D object at 0x7fc0c6166250>
<keras.layers.convolutional.Convolution2D object at 0x7fc0c6150a10>
Weights after change:
[array([[[[ 0., 0., 0.],
[ 0., 2., 0.],
[ 0., 0., 0.]]]], dtype=float32)]
Input:
[[[[ 1. 2. 3. 10.]
[ 4. 5. 6. 11.]
[ 7. 8. 9. 12.]]]]
Output:
[[[[ 4. 8. 12. 40.]
[ 16. 20. 24. 44.]
[ 28. 32. 36. 48.]]]]
Output:
[[[[ 9. 18. 27. 90.]
[ 36. 45. 54. 99.]
[ 63. 72. 81. 108.]]]]
From your peek at .layers you can see that the first layer is input and the others your convolutional layers.

For tf2 the simplest way to actually reset weights would be:
tf_model.set_weights(
clone_model(tf_model).get_weights()
)
clone_model() as mentioned by #danielsaromo returns new model with trainable params initialized from scratch, we use its weights to reinitialize our model thus no model compilation (knowledge about its loss or optimizer) is needed.
There are two caveats though, first is mentioned in clone_model()'s documentation:
clone_model will not preserve the uniqueness of shared objects within the model (e.g. a single variable attached to two distinct layers will be restored as two separate variables).
Another caveat is that for large models cloning might fail due to memory limit.

To "random" re-initialize weights of a compiled untrained model in TF 2.0 (tf.keras):
weights = [glorot_uniform(seed=random.randint(0, 1000))(w.shape) if w.ndim > 1 else w for w in model.get_weights()]
Note the "if wdim > 1 else w". You don't want to re-initialize the biases (they stay 0 or 1).

use keras.backend.clear_session()

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Trying to achieve same result with Pytorch and Tensorflow MultiheadAttention - python

Related

Exporting/Viewing the whole Tensor (764-d)

ValueError: None is only supported in the 1st dimension. Tensor 'flatbuffer_data' has invalid shape '[None, None, 1, 512]'

tensorflow lite conversion changes model weights

Tensorflow linear regression result does not match Numpy/SciKit-Learn

Reset weights in Keras layer

Categories

Resources