I would like to define a network that comprises many templates. Below under Network Definitions is a simplified example where the first network definition is used as a template in the second one. This doesn't work - when I initialise my optimiser is says that the network parameters are empty!
How should I do this properly? The network that I ultimately want is very complicated.
Main Function
if __name__ == "__main__":
myNet = Network().cuda().train()
optimizer = optim.SGD(myNet.parameters(), lr=0.01, momentum=0.9)
Network definitions:
class NetworkTemplate(nn.Module):
def __init__(self):
super(NetworkTemplate, self).__init__()
self.conv1 = nn.Conv2d(1, 3, kernel_size=1, bias=False)
self.bn1 = nn.BatchNorm2d(3)
def forward(self, x):
x = self.conv1(x)
x = self.bn1(x)
return x
class Network(nn.Module):
def __init__(self, nNets):
super(Network, self).__init__()
self.nets = []
for curNet in range(nNets):
self.nets.append(NetworkTemplate())
def forward(self, x):
for curNet in self.nets:
x = curNet(x)
return x
Just use torch.nn.Sequential? Like self.nets=torch.nn.Sequential(*self.nets) after you populated self.nets and then call return self.nets(x) in your forward function?
If you want to do something more complicated, you can put all networks into torch.nn.ModuleList, however you'll need to manually take care of calling them in your forward method in that case (but it can be more complicated than just sequential).
Related
I am using a class, and I want to print the shape of Yproj and Xproj, which are inside of a def.
I am using return, but maybe I am not doing right
class test:
def __init__(self):
self.__build_flag = -1
self.__train_flag = -1
def build(self, dim, ddim, hdim, num_layers=3,
activation=tf.nn.tanh, init_lr=1e-5):
""" Build tensorflow model """
# Parameters
self.__random_state = random_state
# Loss function definition
Sx, Ux , Vx =tf.linalg.svd(self.__psiNNx, full_matrices=False, compute_uv=True)
# # Project X and Y onto Principal components
Yproj=tf.matmul(tf.transpose(Ux), self.__psiNNy)
Xproj=tf.matmul(tf.transpose(Ux), self.__psiNNx)
# Build complete
print("Built tensorflow graph.")
self.__build_flag = 0
My idea is:
class test:
def __init__(self):
def build(self, dim, ddim, hdim, num_layers=3,
activation=tf.nn.tanh, init_lr=1e-5):
""" Build tensorflow model """
bla bla bla
return Xproj, Yproj
I am new to PyTorch and have some custom nn.Modules that I would like to run on a GPU. Let's call them M_outer, M_inner, and M_sub. In general, the structure looks like:
class M_outer(nn.Module):
def __init__(self, **kwargs):
self.inner_1 = M_inner(**kwargs)
self.inner_2 = M_inner(**kwargs)
# ...
def forward(self, input):
result = self.inner_1(input)
result = self.inner_2(result)
# ...
return result
class M_inner(nn.Module):
def __init__(self, **kwargs):
self.sub_1 = M_sub(**kwargs)
self.sub_2 = M_sub(**kwargs)
# ...
def forward(self, input):
result = self.sub_1(input)
result = self.sub_2(result)
# ...
return result
class M_sub(nn.Module):
def __init__(self, **kwargs):
self.emb_1 = nn.Embedding(x, y)
self.emb_2 = nn.Embedding(x, y)
# ...
self.norm = nn.LayerNorm()
def forward(self, input):
emb = (self.emb_1(input) + self.emb_2(input))
# ...
return self.norm(emb)
and I try to get my module on a gpu via:
model = M_outer(params).to(device)
Yet I am still getting errors from the embedding layers saying that some operations are on the cpu.
I have read the documentation. I have read useful Discuss posts like this and related StackOverflow posts like this.
I can not register an nn.EmbeddingLayer via nn.Parameter. What am I missing?
In Tensorflow Federated (TFF), you can pass to the tff.learning.build_federated_averaging_process a broadcast_process and an aggregation_process, which can embed customized encoders e.g. to apply custom compressions.
Getting to the point of my question, I am trying to implement an encoder to sparsify model updates/model weights.
I am trying to build such an encoder by implementing the EncodingStageInterface, from tensorflow_model_optimization.python.core.internal.
However, I am struggling to implement a (local) state to accumulate the zeroed-out coordinates of model updates/model weights round by round. Note that this state should not be communicated, and just need to be maintained locally (so the AdaptiveEncodingStageInterface should not be helpful). In general, the question is how to maintain a local state inside an Encoder to be then passed to the fedavg process.
I attach the code of my encoder implementation (that, besides the state I would like to add, works fine as stateless as expected).
I then attach the excerpt of my code where I use the encoder implementation.
If I decomment the commented parts in stateful_encoding_stage_topk.py the code does not work: I can't figure out how manage the state (that is a Tensor) in TF non eager mode.
stateful_encoding_stage_topk.py
import tensorflow as tf
import numpy as np
from tensorflow_model_optimization.python.core.internal import tensor_encoding as te
#te.core.tf_style_encoding_stage
class StatefulTopKEncodingStage(te.core.EncodingStageInterface):
ENCODED_VALUES_KEY = 'stateful_topk_values'
INDICES_KEY = 'indices'
def __init__(self):
super().__init__()
# Here I would like to init my state
#self.A = tf.zeros([800], dtype=tf.float32)
#property
def name(self):
"""See base class."""
return 'stateful_topk'
#property
def compressible_tensors_keys(self):
"""See base class."""
return [self.ENCODED_VALUES_KEY]
#property
def commutes_with_sum(self):
"""See base class."""
return True
#property
def decode_needs_input_shape(self):
"""See base class."""
return True
def get_params(self):
"""See base class."""
return {}, {}
def encode(self, x, encode_params):
"""See base class."""
del encode_params # Unused.
dW = tf.reshape(x, [-1])
# Here I would like to retrieve the state
A = tf.zeros([800], dtype=tf.float32)
#A = self.residual
dW_and_A = tf.math.add(A, dW)
percentage = tf.constant(0.4, dtype=tf.float32)
k_float = tf.multiply(percentage, tf.cast(tf.size(dW), tf.float32))
k_int = tf.cast(tf.math.round(k_float), dtype=tf.int32)
values, indices = tf.math.top_k(tf.math.abs(dW_and_A), k = k_int, sorted = False)
indices = tf.expand_dims(indices, 1)
sparse_dW = tf.scatter_nd(indices, values, tf.shape(dW_and_A))
# Here I would like to update the state
A_updated = tf.math.subtract(dW_and_A, sparse_dW)
#self.A = A_updated
encoded_x = {self.ENCODED_VALUES_KEY: values,
self.INDICES_KEY: indices}
return encoded_x
def decode(self,
encoded_tensors,
decode_params,
num_summands=None,
shape=None):
"""See base class."""
del decode_params, num_summands # Unused.
indices = encoded_tensors[self.INDICES_KEY]
values = encoded_tensors[self.ENCODED_VALUES_KEY]
tensor = tf.fill([800], 0.0)
decoded_values = tf.tensor_scatter_nd_update(tensor, indices, values)
return tf.reshape(decoded_values, shape)
def sparse_quantizing_encoder():
encoder = te.core.EncoderComposer(
StatefulTopKEncodingStage() )
return encoder.make()
fedavg_with_sparsification.py
[...]
def sparsification_broadcast_encoder_fn(value):
spec = tf.TensorSpec(value.shape, value.dtype)
return te.encoders.as_simple_encoder(te.encoders.identity(), spec)
def sparsification_mean_encoder_fn(value):
spec = tf.TensorSpec(value.shape, value.dtype)
if value.shape.num_elements() == 800:
return te.encoders.as_gather_encoder(
stateful_encoding_stage_topk.sparse_quantizing_encoder(), spec)
else:
return te.encoders.as_gather_encoder(te.encoders.identity(), spec)
encoded_broadcast_process = (
tff.learning.framework.build_encoded_broadcast_process_from_model(
model_fn, sparsification_broadcast_encoder_fn))
encoded_mean_process = (
tff.learning.framework.build_encoded_mean_process_from_model(
model_fn, sparsification_mean_encoder_fn))
iterative_process = tff.learning.build_federated_averaging_process(
model_fn,
client_optimizer_fn=lambda: tf.keras.optimizers.SGD(learning_rate=0.004),
client_weight_fn=lambda _: tf.constant(1.0),
broadcast_process=encoded_broadcast_process,
aggregation_process=encoded_mean_process)
[...]
I am using:
tensorflow 2.4.0
tensorflow-federated 0.17.0
I'll try to answer in two parts; (1) top_k encoder without state and (2) realizing the stateful idea you seem to want in TFF.
(1)
To get the TopKEncodingStage working without state, I see a few details to change.
The commutes_with_sum property should be set to False. In pseudo-code, its meaning is whether sum_x(decode(encode(x))) == decode(sum_x(encode(x))) . This is not true for the representation your encode method returns -- summing the indices would not work well. I think implementation of the decode method can be simplified to
return tf.scatter_nd(
indices=encoded_tensors[self.INDICES_KEY],
updates=encoded_tensors[self.ENCODED_VALUES_KEY],
shape=shape)
(2)
What you refer to cannot be achieved in this manner using tff.learning.build_federated_averaging_process. The process returned by this method does not have any mechanism for maintaining client/local state. Whatever is the state expressed in your StatefulTopKEncodingStage would end up being the server state, not local state.
To work with the client/local state, you may need to write more custom code. For a starter, see examples/stateful_clients which you can adapt to store the state you refer to.
Keep in mind that in TFF, this will need to be represented as functional transformations. Storing values in attributes of a class and use them elsewhere can lead to surprising errors.
I have subclassed RNNCell as the building block of my RNN. I put an instance of this object into tf.dynamic_rnn and then I define a prediction function in my Agent class:
class Agent():
def __init__(self):
...
def predictions(self):
cell = RNNCell()
output, last_state = tf.dynamic_rnn(cell, inputs = ...)
return output
Everything works fine, but how do I add a histogram for the layers now? I've tried to do it in the RNNCell but it doesn't work:
class RNNCell(tf.nn.rnn_cell.RNNCell):
def __init__(self):
super(RNNCell, self).__init__()
self._output_size = 15
self._state_size = 15
self._histogram1 = None
def __call__(self, X, state):
network = tflearn.layers.conv_2d(X, 5, [1, 3], activation='relu', weights_init=tflearn.initializations.variance_scaling(), padding="valid")
self._histogram1 = tf.summary.histogram("layer1_hist_summary", network)
...
#property
def histogram1(self):
return self._histogram1
and then
class Agent():
def __init__(self):
...
def predictions(self):
cell = RNNCell()
self.histogram1 = cell.histogram1
output, last_state = tf.dynamic_rnn(cell, inputs = ...)
return output
Later when I run sess.run(agent.histogram1, feed_dict=...) I get the error TypeError: Fetch argument None has invalid type <class 'NoneType'>
I think the problem is that the value of Agent's self.histogram1 never got updated to reflect that summary assigned in RNNCell.
Your code for the Agent predictions() method initializes Agent's histogram1 value to None here:
cell = RNNCell() #invoks __init__() so RNNCELL's histogram1 is now None
self.histogram1 = cell.histogram1
When RNNCell's __call__() method is invoked, it updates the RNNCell's value of histogram1
self._histogram1 = tf.summary.histogram("layer1_hist_summary", network)
But the Agent's copy of histogram1 was apparently not updated, so when the call is made:
sess.run(agent.histogram1, feed_dict=...)
agent.histogram1 is still None.
I don't see in the posted code where the summaries were merged before training, so the missing step is likely in unposted code somewhere.
Can someone help me understand the Python code below. I am trying to figure what "output = layer.ingredients(input_)" does? I also don't understand how the class handles the object being passed. For example, how would one use the parameter in the contractor?
class Lasagna(Layer):
def __init__(self, layers):
self.layers = layers
def ingredients(self, input_):
# We remember the inputs for each layer so that we can use them
self.inputs = []
for layer in self.layers:
self.inputs.append(input_)
output = layer.ingredients(input_)
input_ = output
return output
lasagna = Lasagna([Linear(784, 100), Function1(), Linear(100, 10), Function2()])
Thanks