Related
I'm trying to recreate a transformer written in Pytorch and implement it in Tensorflow. The problem is that despite both the documentation for the Pytorch version and Tensorflow version, they still come out pretty differently.
I wrote a little code snippet to show the issue:
import torch
import tensorflow as tf
import numpy as np
class TransformerLayer(tf.Module):
def __init__(self, d_model, nhead, dropout=0):
super(TransformerLayer, self).__init__()
self.self_attn = torch.nn.MultiheadAttention(d_model, nhead, dropout=dropout)
batch_size = 2
seq_length = 5
d_model = 10
src = np.random.uniform(size=(batch_size, seq_length, d_model))
srcTF = tf.convert_to_tensor(src)
srcPT = torch.Tensor(src.reshape((seq_length, batch_size, d_model)))
self_attnTF = tf.keras.layers.MultiHeadAttention(key_dim=10, num_heads=5, dropout=0)
transformer_encoder = TransformerLayer(d_model=10, nhead=5, dropout=0.0)
output, scores = self_attnTF(srcTF, srcTF, srcTF, return_attention_scores=True)
print("Tensorflow Attendtion outputs:", output)
print("Tensorflow (averaged) weights:", tf.math.reduce_mean(scores, 1))
print("Torch Attendtion outputs:", transformer_encoder.self_attn(srcPT,srcPT,srcPT)[0])
print("Torch attention output weights:", transformer_encoder.self_attn(srcPT,srcPT,srcPT)[1])
and the result is:
Tensorflow Attendtion outputs: tf.Tensor(
[[[ 0.02602757 -0.14134401 0.00855263 0.4735083 -0.01851891
-0.20382246 -0.18152176 -0.21076852 0.08623976 -0.33548725]
[ 0.02607442 -0.1403394 0.00814065 0.47415024 -0.01882939
-0.20353754 -0.18291879 -0.21234266 0.08595885 -0.33613583]
[ 0.02524654 -0.14096384 0.00870436 0.47411725 -0.01800703
-0.20486829 -0.18163288 -0.21082559 0.08571021 -0.3362339 ]
[ 0.02518575 -0.14039244 0.0090138 0.47431853 -0.01775141
-0.20391947 -0.18138805 -0.2118245 0.08432849 -0.33521986]
[ 0.02556361 -0.14039293 0.00876258 0.4746476 -0.01891363
-0.20398234 -0.18229616 -0.21147579 0.08555281 -0.33639923]]
[[ 0.07844199 -0.1614371 0.01649148 0.5287745 0.05126739
-0.13851154 -0.09829871 -0.1621251 0.01922669 -0.2428589 ]
[ 0.07844222 -0.16024739 0.01805423 0.52941847 0.04975721
-0.13537636 -0.09829231 -0.16129729 0.01979005 -0.24491176]
[ 0.07800542 -0.160701 0.01677295 0.52902794 0.05082911
-0.13843337 -0.09805533 -0.16165744 0.01928401 -0.24327613]
[ 0.07815789 -0.1600025 0.01757433 0.5291927 0.05032986
-0.1368022 -0.09849522 -0.16172451 0.01929555 -0.24438493]
[ 0.0781548 -0.16028519 0.01764914 0.52846324 0.04941286
-0.13746066 -0.09787872 -0.16141161 0.01994199 -0.2440269 ]]], shape=(2, 5, 10), dtype=float32)
Tensorflow (averaged) weights: tf.Tensor(
[[[0.199085 0.20275716 0.20086522 0.19873264 0.19856 ]
[0.2015336 0.19960018 0.20218948 0.19891861 0.19775811]
[0.19906266 0.20318432 0.20190334 0.19812575 0.19772394]
[0.20074987 0.20104568 0.20269363 0.19744729 0.19806348]
[0.19953248 0.20176074 0.20314851 0.19782843 0.19772986]]
[[0.2010009 0.20053487 0.20004745 0.20092985 0.19748697]
[0.20034568 0.20035927 0.19955876 0.20062163 0.19911464]
[0.19967113 0.2006859 0.20012529 0.20047483 0.19904283]
[0.20132652 0.19996871 0.20019794 0.20008174 0.19842513]
[0.2006393 0.20000939 0.19938737 0.20054278 0.19942114]]], shape=(2, 5, 5), dtype=float32)
Torch Attendtion outputs: tensor([[[ 0.1097, -0.4467, -0.0719, -0.1779, -0.0766, -0.1247, 0.1557,
0.0051, -0.3932, -0.1323],
[ 0.1264, -0.3822, 0.0759, -0.0335, -0.1084, -0.1539, 0.1475,
-0.0272, -0.4235, -0.1744]],
[[ 0.1122, -0.4502, -0.0747, -0.1796, -0.0756, -0.1271, 0.1581,
0.0049, -0.3964, -0.1340],
[ 0.1274, -0.3823, 0.0754, -0.0356, -0.1091, -0.1547, 0.1477,
-0.0272, -0.4252, -0.1752]],
[[ 0.1089, -0.4427, -0.0728, -0.1746, -0.0756, -0.1202, 0.1501,
0.0031, -0.3894, -0.1242],
[ 0.1263, -0.3820, 0.0718, -0.0374, -0.1063, -0.1562, 0.1485,
-0.0271, -0.4233, -0.1761]],
[[ 0.1061, -0.4369, -0.0685, -0.1696, -0.0772, -0.1173, 0.1454,
0.0012, -0.3860, -0.1201],
[ 0.1265, -0.3820, 0.0762, -0.0325, -0.1082, -0.1560, 0.1501,
-0.0271, -0.4249, -0.1779]],
[[ 0.1043, -0.4402, -0.0705, -0.1719, -0.0791, -0.1205, 0.1508,
0.0018, -0.3895, -0.1262],
[ 0.1260, -0.3805, 0.0775, -0.0298, -0.1083, -0.1547, 0.1494,
-0.0276, -0.4242, -0.1768]]], grad_fn=<AddBackward0>)
Torch attention output weights: tensor([[[0.2082, 0.2054, 0.1877, 0.1956, 0.2031],
[0.2100, 0.2079, 0.1841, 0.1943, 0.2037],
[0.2007, 0.1995, 0.1929, 0.1999, 0.2070],
[0.1995, 0.1950, 0.1976, 0.2002, 0.2077],
[0.1989, 0.1969, 0.1970, 0.2024, 0.2048]],
[[0.2095, 0.1902, 0.1987, 0.2027, 0.1989],
[0.2090, 0.1956, 0.1997, 0.2004, 0.1952],
[0.2047, 0.1869, 0.2006, 0.2121, 0.1957],
[0.2073, 0.1953, 0.1982, 0.2014, 0.1978],
[0.2089, 0.2003, 0.1953, 0.1957, 0.1998]]], grad_fn=<DivBackward0>)
The output weights look similar but the base attention outputs are way off. Is there any way to make the Tensorflow model come out more like the Pytorch one? Any help would be greatly appreciated!
In MultiHeadAttention there is also a projection layer, like
Q = W_q # input_query + b_q
K = W_k # input_keys + b_k
V = W_v # input_values + b_v
Matrices W_q, W_k and W_v and biases b_q, b_k, b_v are initialized randomly, so difference in outputs should be expected (even between outputs of two distinct layers in pytorch on same input). After self-attention operation there is one more projection and it's also initialized randomly. Weights can be set manually in tensorflow by calling method set_weights of self_attnTF.
Correspondence between weights in tf.keras.layers.MultiHeadAttention and nn.MultiheadAttention not so clear, as an example: torch shares weights between heads, while tf keeps them unique. So if you are using weights of pretrained model from pytorch and try to put them in tensorflow model (for whatever reason) it'll certainly take more than five minutes.
Results should be the same if after initializing pytorch model and tensorflow model you step through their parameters and assign them identical values.
I recently started using pytorch. I've been using tensorflow framework before. I have a piece of code that I implemented with tensorflow, which I now want to convert to the pytorch version.
I'm new to pytorch and I'm not familiar with its functions and the transformation process is not smooth, so I'd like to consult.
Here's the code I want to convert:
def kl_loss_compute(logits1, logits2):
""" KL loss
"""
pred1 = tf.nn.softmax(logits1)
pred2 = tf.nn.softmax(logits2)
loss = tf.reduce_mean(tf.reduce_sum(pred2 * tf.log(1e-8 + pred2 / (pred1 + 1e-8)), 1))
return loss
python: 3.6, ubuntu: 16.04
logits1 and logits2 are FC layer's outputs. Their shape is [batch, n]
Here is my Implementation (I am taking an example of logits of dimension [3,5]):
Tensorflow Version:
import tensorflow as tf
def kl_loss_compute(logits1, logits2):
""" KL loss
"""
pred1 = tf.nn.softmax(logits1)
print(pred1.eval())
pred2 = tf.nn.softmax(logits2)
print(pred2.eval())
loss = tf.reduce_mean(tf.reduce_sum(pred2 * tf.log(1e-8 + pred2 / (pred1 + 1e-8)), 1))
return loss
x1 = tf.random.normal([3, 5], dtype=tf.float32)
x2 = tf.random.normal([3, 5], dtype=tf.float32)
with tf.Session() as sess:
x1 = sess.run(x1)
print(x1)
x2 = sess.run(x2)
print(x2)
print(30*'=')
print(sess.run(kl_loss_compute(x1, x2)))
Output:
[[ 0.9801388 -0.2514422 -0.28299806 0.85130763 0.4565948 ]
[-1.0744809 0.20301117 0.21026622 1.0385195 0.41147012]
[ 1.2385081 1.1003486 -2.0818367 -1.0446491 1.8817908 ]]
[[ 0.04036871 0.82306993 0.82962424 0.5209219 -0.10473887]
[ 1.7777447 -0.6257034 -0.68985045 -1.1191329 -0.2600192 ]
[ 0.03387258 0.44405013 0.08010675 0.9131149 0.6422863 ]]
==============================
[[0.32828477 0.09580362 0.09282765 0.2886025 0.19448158]
[0.04786159 0.17170973 0.17296004 0.39596024 0.21150835]
[0.2556382 0.22265059 0.00923886 0.02606533 0.48640704]]
[[0.12704821 0.27790183 0.27972925 0.20543297 0.10988771]
[0.7349108 0.06644011 0.062312 0.04056362 0.09577343]
[0.12818882 0.19319147 0.13425465 0.30881628 0.23554876]]
0.96658206
PyTorch Version:
def kl_loss_compute(logits1, logits2):
""" KL loss
"""
pred1 = torch.softmax(logits1, dim=-1, dtype=torch.float32)
print(pred1)
pred2 = torch.softmax(logits2, dim=-1, dtype=torch.float32)
print(pred2)
loss = torch.mean(torch.sum(pred2 * torch.log(1e-8 + pred2 / (pred1 + 1e-8)), -1))
return loss
# same inputs are used here as above(see the inputs used in tensorflow code in the output)
x = torch.Tensor([[ 0.9801388, -0.2514422 , -0.28299806 , 0.85130763, 0.4565948 ],
[-1.0744809 , 0.20301117, 0.21026622, 1.0385195, 0.41147012],
[ 1.2385081 , 1.1003486, -2.0818367, -1.0446491, 1.8817908 ]])
y = torch.Tensor([[ 0.04036871 , 0.82306993, 0.82962424, 0.5209219, -0.10473887],
[ 1.7777447 ,-0.6257034, -0.68985045, -1.1191329, -0.2600192 ],
[ 0.03387258 , 0.44405013 , 0.08010675, 0.9131149, 0.6422863 ]])
print(kl_loss_compute(x, y))
Output:
tensor([[0.3283, 0.0958, 0.0928, 0.2886, 0.1945],
[0.0479, 0.1717, 0.1730, 0.3960, 0.2115],
[0.2556, 0.2227, 0.0092, 0.0261, 0.4864]])
tensor([[0.1270, 0.2779, 0.2797, 0.2054, 0.1099],
[0.7349, 0.0664, 0.0623, 0.0406, 0.0958],
[0.1282, 0.1932, 0.1343, 0.3088, 0.2355]])
tensor(0.9666)
I am quite new to python and interesting in doing Gaussian regression.
I am under py3.6 and SKlearn 0.19.
I have simple code and I get an error about the dimension of the vectors in cdist called by predict. I understand there's something bad in my input. But I do not see why...
I looked for example of gaussian process regressor, but it does not seems to be the most common tools.
Thank in advance for you help.
Cheers.
Here is a sample of my code:
import pandas as pd
import numpy as np
import numpy as np
from sklearn.gaussian_process import GaussianProcessRegressor as gpr
from sklearn.gaussian_process.kernels import RBF, ConstantKernel as C
....
#X_train are the training samples
X_train= np.column_stack((xc,yc,zc))
print('X_train')
print(X_train.shape)
print(X_train)
Here is the print of X_train:
X_train (4576, 3)
[[ 0.71958336 -1.12719598 0.47889958]
[ 0.71958336 -1.12719598 0.47889958]
[ 0.71958336 -1.12719598 0.34285071]
...
[ 0.55255508 -1.18817547 -1.63666023]
[ 0.55255508 -1.18817547 -1.70468466]
[ 0.55255508 -1.18817547 -1.77270909]]
here is the target feature on the training:
print('v1')
print(v1.shape)
print(v1)
its print
v1
(4576,)
0 10.0
1 14.0
2 13.0
3 19.0
....
4573 39.0
4574 16.0
4575 12.0
Here is the samples to predict:
x = np.column_stack((xp,
yp,
zp))
print('x')
print(x.shape)
print(x)
here is the print:
x
(75, 3)
[[-1.41421356 -1.41421356 -1.22474487]
[-0.70710678 -1.41421356 -1.22474487]
[ 0. -1.41421356 -1.22474487]
[ 0.70710678 -1.41421356 -1.22474487]
.....
[ 0.70710678 -0.70710678 -1.22474487]
[ 1.41421356 -0.70710678 -1.22474487]
[-1.41421356 0. -1.22474487]
[-0.70710678 0. -1.22474487]
[ 0. 0. -1.22474487]
Here is the fitting and prediction
v1 = v1.ravel()
#default kernel
kernel = C(1.0, (1e-3, 1e3)) * RBF(10, (1e-2, 1e2))
X_train, v1 = make_regression()
model = gpr(kernel=kernel, n_restarts_optimizer=9)
model.fit(X_train,v1)
#Predict v1
v1_pred = model.predict(x)
When runing I get the following error:
File "test.py", line 189, in test
v1_pred = model.predict(x) File "/usr/local/lib/python3.6/site-packages/sklearn/gaussian_process/gpr.py",
line 315, in predict
K_trans = self.kernel_(X, self.X_train_) File "/usr/local/lib/python3.6/site-packages/sklearn/gaussian_process/kernels.py",
line 758, in call
return self.k1(X, Y) * self.k2(X, Y) File "/usr/local/lib/python3.6/site-packages/sklearn/gaussian_process/kernels.py",
line 1215, in call
metric='sqeuclidean') File "/usr/local/lib/python3.6/site-packages/scipy/spatial/distance.py",
line 2373, in cdist
raise ValueError('XA and XB must have the same number of columns ' ValueError: XA and XB must have the same number of columns (i.e.
feature dimension.)
I hav simply copy paste a code and did something stupid:
X_train, v1 = make_regression()
Just had to remove it.
I want to perform a neural network regression on a data set. For testing purposes I have sample it down to 10000 rows. The input is 3 columns, the output is 1 column. I use the code below (I've replaced variable names).
import pandas as pd
import numpy as np
import os
from sklearn.neural_network import MLPRegressor
"""
Prepare
"""
train = os.path.join(r'C:\Documents and Settings\', 'input.csv')
df = pd.read_csv(train)
df = df[['A', 'B', 'C','D']]
df = df.dropna().sample(n=10000)
y = df['D'].as_matrix().reshape(10000,1)
x = df[['A', 'B','C']].as_matrix().reshape(10000,3)
print x
print y
print "Length before regression, x: %s, y: %s" % (x.shape, y.shape)
"""
Regression
"""
mlp = MLPRegressor(hidden_layer_sizes=(5, ), activation='relu', verbose=True, learning_rate_init=1, learning_rate='adaptive', max_iter=500,)
mlp.fit(x,y)
mlp.score(x,y)
print mlp.coefs_
print mlp.n_layers_
print mlp.n_outputs_
print mlp.out_activation_
print "res: ",res
res = mlp.predict(x)
r = np.subtract(df['D'].as_matrix(), res)
Running this code gives the following output:
[[ 162. 9. 475.5 ]
[ 105. 6.39 232.5 ]
[ 141. 7.44 373.5 ]
...,
[ 120. 8.41 450.5 ]
[ 120. 8.77 464. ]
[ 160. 8.77 483. ]]
[[ 72. ]
[ 73. ]
[ 74.5]
...,
[ 53. ]
[ 52. ]
[ 73. ]]
Length before regression, x: (10000, 3), y: (10000, 1)
Iteration 1, loss = 43928.72815906
Iteration 2, loss = 3434.26257670
Iteration 3, loss = 2393.24701752
Iteration 4, loss = 1662.31634550
Iteration 5, loss = 1225.37443598
Iteration 6, loss = 997.21761203
Iteration 7, loss = 891.10992049
Iteration 8, loss = 847.20461842
Iteration 9, loss = 830.60945144
Iteration 10, loss = 825.10945455
Iteration 11, loss = 823.39941482
Iteration 12, loss = 822.96788084
Iteration 13, loss = 822.85930250
Iteration 14, loss = 822.83848702
Iteration 15, loss = 822.84245376
Iteration 16, loss = 822.84871312
Iteration 17, loss = 822.83965835
Training loss did not improve more than tol=0.000100 for two consecutive epochs. Stopping.
[array([[-5.33, -5.23, -5.15, -4.86, -5.68],
[-5.28, -5.86, -5.83, -5.98, -6.2 ],
[-5.32, -5.79, -5.02, -4.71, -5.87]]), array([[-5.69],
[-5.06],
[ 4.35],
[ 4.6 ],
[-5.66]])]
3
1
identity
res: [ 95.53 95.53 95.53 ..., 95.53 95.53 95.53]
The resulting res variable is then constant.
I've played a little with the prediction and found that input values below 0.01 give a little change in result. Also I find that the out_activation_ is always identity, though I've set the activation function to be relu.
I'm kind of lost to what might cause this behavior. Why does it seem x needs to be different (normalized?) for fit() than for predict()?
note: as was commented below there is no cross-validation in this example. I am aware of that.
I'd like to reset (randomize) the weights of all layers in my Keras (deep learning) model. The reason is that I want to be able to train the model several times with different data splits without having to do the (slow) model recompilation every time.
Inspired by this discussion, I'm trying the following code:
# Reset weights
for layer in KModel.layers:
if hasattr(layer,'init'):
input_dim = layer.input_shape[1]
new_weights = layer.init((input_dim, layer.output_dim),name='{}_W'.format(layer.name))
layer.trainable_weights[0].set_value(new_weights.get_value())
However, it only partly works.
Partly, becuase I've inspected some layer.get_weights() values, and they seem to change. But when I restart the training, the cost values are much lower than the initial cost values on the first run. It's almost like I've succeeded resetting some of the weights, but not all of them.
Save the initial weights right after compiling the model but before training it:
model.save_weights('model.h5')
and then after training, "reset" the model by reloading the initial weights:
model.load_weights('model.h5')
This gives you an apples to apples model to compare different data sets and should be quicker than recompiling the entire model.
Reset all layers by checking for initializers:
def reset_weights(model):
import keras.backend as K
session = K.get_session()
for layer in model.layers:
if hasattr(layer, 'kernel_initializer'):
layer.kernel.initializer.run(session=session)
if hasattr(layer, 'bias_initializer'):
layer.bias.initializer.run(session=session)
Update: kernel_initializer is kernel.initializer now.
If you want to truly re-randomize the weights, and not merely restore the initial weights, you can do the following. The code is slightly different depending on whether you're using TensorFlow or Theano.
from keras.initializers import glorot_uniform # Or your initializer of choice
import keras.backend as K
initial_weights = model.get_weights()
backend_name = K.backend()
if backend_name == 'tensorflow':
k_eval = lambda placeholder: placeholder.eval(session=K.get_session())
elif backend_name == 'theano':
k_eval = lambda placeholder: placeholder.eval()
else:
raise ValueError("Unsupported backend")
new_weights = [k_eval(glorot_uniform()(w.shape)) for w in initial_weights]
model.set_weights(new_weights)
I have found the clone_model function that creates a cloned network with the same architecture but new model weights.
Example of use:
model_cloned = tensorflow.keras.models.clone_model(model_base)
Comparing the weights:
original_weights = model_base.get_weights()
print("Original weights", original_weights)
print("========================================================")
print("========================================================")
print("========================================================")
model_cloned = tensorflow.keras.models.clone_model(model_base)
new_weights = model_cloned.get_weights()
print("New weights", new_weights)
If you execute this code several times, you will notice that the cloned model receives new weights each time.
Tensorflow 2 answer:
for ix, layer in enumerate(model.layers):
if hasattr(model.layers[ix], 'kernel_initializer') and \
hasattr(model.layers[ix], 'bias_initializer'):
weight_initializer = model.layers[ix].kernel_initializer
bias_initializer = model.layers[ix].bias_initializer
old_weights, old_biases = model.layers[ix].get_weights()
model.layers[ix].set_weights([
weight_initializer(shape=old_weights.shape),
bias_initializer(shape=old_biases.shape)])
Original weights:
model.layers[1].get_weights()[0][0]
array([ 0.4450057 , -0.13564804, 0.35884023, 0.41411972, 0.24866664,
0.07641453, 0.45726687, -0.04410008, 0.33194816, -0.1965386 ,
-0.38438258, -0.13263905, -0.23807487, 0.40130925, -0.07339832,
0.20535922], dtype=float32)
New weights:
model.layers[1].get_weights()[0][0]
array([-0.4607593 , -0.13104361, -0.0372932 , -0.34242013, 0.12066692,
-0.39146423, 0.3247317 , 0.2635846 , -0.10496247, -0.40134245,
0.19276887, 0.2652442 , -0.18802321, -0.18488845, 0.0826562 ,
-0.23322225], dtype=float32)
K.get_session().close()
K.set_session(tf.Session())
K.get_session().run(tf.global_variables_initializer())
Try set_weights.
for example:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
from __future__ import print_function
import numpy as np
np.random.seed(1234)
from keras.layers import Input
from keras.layers.convolutional import Convolution2D
from keras.models import Model
print("Building Model...")
inp = Input(shape=(1,None,None))
x = Convolution2D(1, 3, 3, border_mode='same', init='normal',bias=False)(inp)
output = Convolution2D(1, 3, 3, border_mode='same', init='normal',bias=False)(x)
model_network = Model(input=inp, output=output)
w = np.asarray([
[[[
[0,0,0],
[0,2,0],
[0,0,0]
]]]
])
for layer_i in range(len(model_network.layers)):
print (model_network.layers[layer_i])
for layer_i in range(1,len(model_network.layers)):
model_network.layers[layer_i].set_weights(w)
input_mat = np.asarray([
[[
[1.,2.,3.,10.],
[4.,5.,6.,11.],
[7.,8.,9.,12.]
]]
])
print("Input:")
print(input_mat)
print("Output:")
print(model_network.predict(input_mat))
w2 = np.asarray([
[[[
[0,0,0],
[0,3,0],
[0,0,0]
]]]
])
for layer_i in range(1,len(model_network.layers)):
model_network.layers[layer_i].set_weights(w2)
print("Output:")
print(model_network.predict(input_mat))
build a model with say, two convolutional layers
print("Building Model...")
inp = Input(shape=(1,None,None))
x = Convolution2D(1, 3, 3, border_mode='same', init='normal',bias=False)(inp)
output = Convolution2D(1, 3, 3, border_mode='same', init='normal',bias=False)(x)
model_network = Model(input=inp, output=output)
then define your weights (i'm using a simple w, but you could use np.random.uniform or anything like that if you want)
w = np.asarray([
[[[
[0,0,0],
[0,2,0],
[0,0,0]
]]]
])
Take a peek at what are the layers inside a model
for layer_i in range(len(model_network.layers)):
print (model_network.layers[layer_i])
Set each weight for each convolutional layer (you'll see that the first layer is actually input and you don't want to change that, that's why the range starts from 1 not zero).
for layer_i in range(1,len(model_network.layers)):
model_network.layers[layer_i].set_weights(w)
Generate some input for your test and predict the output from your model
input_mat = np.asarray([
[[
[1.,2.,3.,10.],
[4.,5.,6.,11.],
[7.,8.,9.,12.]
]]
])
print("Output:")
print(model_network.predict(input_mat))
You could change it again if you want and check again for the output:
w2 = np.asarray([
[[[
[0,0,0],
[0,3,0],
[0,0,0]
]]]
])
for layer_i in range(1,len(model_network.layers)):
model_network.layers[layer_i].set_weights(w2)
print("Output:")
print(model_network.predict(input_mat))
Sample output:
Using Theano backend.
Building Model...
<keras.engine.topology.InputLayer object at 0x7fc0c619fd50>
<keras.layers.convolutional.Convolution2D object at 0x7fc0c6166250>
<keras.layers.convolutional.Convolution2D object at 0x7fc0c6150a10>
Weights after change:
[array([[[[ 0., 0., 0.],
[ 0., 2., 0.],
[ 0., 0., 0.]]]], dtype=float32)]
Input:
[[[[ 1. 2. 3. 10.]
[ 4. 5. 6. 11.]
[ 7. 8. 9. 12.]]]]
Output:
[[[[ 4. 8. 12. 40.]
[ 16. 20. 24. 44.]
[ 28. 32. 36. 48.]]]]
Output:
[[[[ 9. 18. 27. 90.]
[ 36. 45. 54. 99.]
[ 63. 72. 81. 108.]]]]
From your peek at .layers you can see that the first layer is input and the others your convolutional layers.
For tf2 the simplest way to actually reset weights would be:
tf_model.set_weights(
clone_model(tf_model).get_weights()
)
clone_model() as mentioned by #danielsaromo returns new model with trainable params initialized from scratch, we use its weights to reinitialize our model thus no model compilation (knowledge about its loss or optimizer) is needed.
There are two caveats though, first is mentioned in clone_model()'s documentation:
clone_model will not preserve the uniqueness of shared objects within the model (e.g. a single variable attached to two distinct layers will be restored as two separate variables).
Another caveat is that for large models cloning might fail due to memory limit.
To "random" re-initialize weights of a compiled untrained model in TF 2.0 (tf.keras):
weights = [glorot_uniform(seed=random.randint(0, 1000))(w.shape) if w.ndim > 1 else w for w in model.get_weights()]
Note the "if wdim > 1 else w". You don't want to re-initialize the biases (they stay 0 or 1).
use keras.backend.clear_session()