I load features and labels from my training dataset. Both of them are originally numpy arrays, but I change them to the torch tensor using torch.from _numpy(features.copy()) and torch.tensor(labels.astype(np.bool)).
And I notice that torch.autograd.Variable is something like placeholder in tensorflow.
When I train my network, first I tried
features = features.cuda()
labels = labels.cuda()
outputs = Config.MODEL(features)
loss = Config.LOSS(outputs, labels)
Then I tried
features = features.cuda()
labels = labels.cuda()
input_var = Variable(features)
target_var = Variable(labels)
outputs = Config.MODEL(input_var)
loss = Config.LOSS(outputs, target_var)
Both blocks succeed in activating training, but I worried that there might be trivial difference.
According to this question you no longer need variables to use Pytorch Autograd.
Thanks to #skytree, we can make this even more explizit: Variables have been deprecated, i.e. you're not supposed to use them anymore.
Autograd automatically supports Tensors with requires_grad set to True.
And more importantly
Variable(tensor) and Variable(tensor, requires_grad) still work as expected, but they return Tensors instead of Variables.
This means that if your features and labels are tensors already (which they seem to be in your example) your Variable(features) and Variable(labels) does only return a tensor again.
The original purpose of Variables was to be able to use automatic differentiation (Source):
Variables are just wrappers for the tensors so you can now easily auto compute the gradients.
Related
I'm trying to write a reinforcement learning agent using tensorflow. I'm wondering if the states should be tf.Variables or can be numpy arrays for backpropogation using gradient tape. I'm not sure if the gradients will be correct if my states/action arrays are numpy instead of tensorflow arrays, I do know that the loss function returns a tf.Variable however. Thanks, I'm still a beginner to using Tensorflow any explanation/suggestions would help alot.
In a very simplified form (not word for word), my code looks something like:
with tf.GradientTape as tape:
#actions/states are both lists of np arrays
action = model.call(state)
states.append(state)
actions.append(actions)
loss = model.loss(states,actions) #loss returns tf.variable
model.optimizer.apply_gradients(tape.gradient(loss, model.variables)
Hi Noob :) The optimizer.apply_gradients operation will update only model tf.Variables having non-zero gradients (see input argument model.variables).
Reference: https://www.tensorflow.org/api_docs/python/tf/GradientTape
Trainable variables (created by tf.Variable or
tf.compat.v1.get_variable, where trainable=True is default in both
cases) are automatically watched. Tensors can be manually watched by
invoking the watch method on this context manager.
Edit: if you want to call the model to make a predictions given a numpy array: this is sort of possible. According to the documentation the input of model.call() should be a tensor object. You can simply get a tensor from your numpy array as:
state # numpy array
tf_state = tf.constant(state)
model.call(tf_state)
Of course, instead of creating new tf.constants for each iteration of the training loop, you can first initialize a (non-trainable) tf.Variables, and then just update its values with those of the numpy array! Something like the following should work:
tf_state = tf.Variable(np.zeros_like(state), dtype=tf.float32, trainable=False)
for iter in n_train_iterations:
state = get_new_numpy_state()
tf_state.assign(state)
model.call(tf_state)
I initialized nn.Embedding with some pretrain parameters (they are 128 dim vectors), the following code demonstrates how I do this:
self.myvectors = gensim.models.KeyedVectors.load_word2vec_format(cfg.vec_dir)
self.vec_weights = torch.FloatTensor(self.myvectors.vectors)
self.embeds = torch.nn.Embedding.from_pretrained(self.vec_weights)
cfg.vec_dir is a json file where vec_dir indicates the path of the pretrained 128 dim vectors I used to initialize this layer.
After the model is trained, I print out this embedding layer, and I found that the parameters are exactly the same as I initialized them, so clearly the parameters are not updated during the training. Why is this happening? What should I do in order to update these vectors?
The torch.nn.Embedding.from_pretrained classmethod by default freezes the parameters. If you want to train the parameters, you need to set the freeze keyword argument to False. See the documentation.
So you might try this instead:
self.embeds = torch.nn.Embedding.from_pretrained(self.vec_weights, freeze=False)
How would one best add a preprocessing layer (e.g., subtract mean and divide by std) to a keras (v2.0.5) model such that the model becomes fully self contained for deployment (possibly in a C++ environment). I tried:
def getmodel():
model = Sequential()
mean_tensor = K.placeholder(shape=(1,1,3), name="mean_tensor")
std_tensor = K.placeholder(shape=(1,1,3), name="std_tensor")
preproc_layer = Lambda(lambda x: (x - mean_tensor) / (std_tensor + K.epsilon()),
input_shape=im_shape)
model.add(preproc_layer)
# Build the remaining model, perhaps set weights,
...
return model
Then, somewhere else set the mean/std on the model. I found the set_value function so tried the following:
m = getmodel()
mean, std = get_mean_std(..)
graph = K.get_session().graph
mean_tensor = graph.get_tensor_by_name("mean_tensor:0")
std_tensor = graph.get_tensor_by_name("std_tensor:0")
K.set_value(mean_tensor, mean)
K.set_value(std_tensor, std)
However the set_value fails with
AttributeError: 'Tensor' object has no attribute 'assign'
So set_value does not work as (the limited) docs would suggest. What would the proper way be to do this? Get the TF session, wrap all the training code in a with (session) and use feed_dict? I would have thought there would be a native keras way to set tensor values.
Instead of using a placeholder I tried setting the mean/std on model construction using either K.variable or K.constant:
mean_tensor = K.variable(mean, name="mean_tensor")
std_tensor = K.variable(std, name="std_tensor")
This avoids any set_value problems. Though I notice that if I try to train that model (which I know is not particularly efficient as you are re-doing the normalisation for every image) it works but at the end of the first epoch the ModelCheckpoint handler fails with a very deep stack trace:
...
File "/Users/dgorissen/Library/Python/2.7/lib/python/site-packages/keras/models.py", line 102, in save_model
'config': model.get_config()
File "/Users/dgorissen/Library/Python/2.7/lib/python/site-packages/keras/models.py", line 1193, in get_config
return copy.deepcopy(config)
File "/usr/local/Cellar/python/2.7.12_2/Frameworks/Python.framework/Versions/2.7/lib/python2.7/copy.py", line 163, in deepcopy
y = copier(x, memo)
...
File "/usr/local/Cellar/python/2.7.12_2/Frameworks/Python.framework/Versions/2.7/lib/python2.7/copy.py", line 190, in deepcopy
y = _reconstruct(x, rv, 1, memo)
File "/usr/local/Cellar/python/2.7.12_2/Frameworks/Python.framework/Versions/2.7/lib/python2.7/copy.py", line 343, in _reconstruct
y.__dict__.update(state)
AttributeError: 'NoneType' object has no attribute 'update'
Update 1:
I also tried a different approach. Train a model as normal, then just prepend a second model that does the preprocessing:
# Regular model, trained as usual
model = ...
# Preprocessing model
preproc_model = Sequential()
mean_tensor = K.constant(mean, name="mean_tensor")
std_tensor = K.constant(std, name="std_tensor")
preproc_layer = Lambda(lambda x: (x - mean_tensor) / (std_tensor + K.epsilon()),
input_shape=im_shape, name="normalisation")
preproc_model.add(preproc_layer)
# Prepend the preprocessing model to the regular model
full_model = Model(inputs=[preproc_model.input],
outputs=[model(preproc_model.output)])
# Save the complete model to disk
full_model.save('full_model.hdf5')
This seems to work until the save() call, which fails with the same deep stack trace as above.
Perhaps the Lambda layer is the problem but juding from this issue the it seems it should serialise properly though.
So overall, how to I append a normalisation layer to a keras model without compromising the ability to serialise (and export to pb)?
Im sure you can get it working by dropping down to TF directly (e.g. this thread, or using tf.Transform) but would have thought it would be possible in keras directly.
Update 2:
So I found that the deep stack trace could be avoided by doing
def foo(x):
bar = K.variable(baz, name="baz")
return x - bar
So defining bar inside the function instead of capturing from the outside scope.
I then found I could save to disk but could not load from disk. There are a suite of github issues around this. I used the workaround specified in #5396 to pass all variables in as arguments, this then allowed me to save and load.
Thinking I was almost there I continued with my approach from Update 1 above of stacking a pre-processing model in front of a trained model.
This then led to Model is not compiled errors. Worked around those but in the end I never managed to get the following to work:
Build and train a model
Save it to disk
Load it, prepend a preprocessing model
Export the stacked model to disk as a frozen pb file
Load the frozen pb from disk
Apply it on some unseen data
I got it to the point where there were no errors, but could not get the normalisation tensors to propagate through to the frozen pb. Having spent too much time on this I then gave up and switched to the somewhat less elegant approach of:
Build a model with the preprocessing operations in the model from the start but set to a no-op (mean=0, std=1)
Train the model, build an identical model but this time with the proper values for mean/std.
Transfer the weights
Export and freeze the model to pb
All this now fully works as expected. Small overhead on training but negligible for me.
Still failed to figure out how one would set the value of a tensor variable in keras (without raising the assign exception) but can do without it for now.
Will accept #Daniel's answer as it got me going in the right direction.
Related question:
Add Tensorflow pre-processing to existing Keras model (for use in Tensorflow Serving)
When creating a variable, you must give it the "value", not the shape:
mean_tensor = K.variable(mean, name="mean_tensor")
std_tensor = K.variable(std, name="std_tensor")
Now, in Keras, you don't have to deal with session, graph and things like that. You work only with layers, and inside Lambda layers (or loss functions) you may work with tensors.
For our Lambda layer, we need a more complex function, because shapes must match before you do a calculation. Since I don't know im_shape, I supposed it had 3 dimensions:
def myFunc(x):
#reshape x in a way it's compatible with the tensors mean and std:
x = K.reshape(x,(-1,1,1,3))
#-1 is like a wildcard, it will be the value that matches the rest of the given shape.
#I chose (1,1,3) because it's the same shape of mean_tensor and std_tensor
result = (x - mean_tensor) / (std_tensor + K.epsilon())
#now shape it back to the same shape it was before (which I don't know)
return K.reshape(result,(-1,im_shape[0], im_shape[1], im_shape[2]))
#-1 is still necessary, it's the batch size
Now we create the Lambda layer, considering it needs also an output shape (because of your custom operation, the system does not necessarily know the output shape)
model.add(Lambda(myFunc,input_shape=im_shape, output_shape=im_shape))
After this, just compile the model and train it. (Often with model.compile(...) and model.fit(...))
If you want to include everything, including the preprocessing inside the function, ok too:
def myFunc(x):
mean_tensor = K.mean(x,axis=[0,1,2]) #considering shapes of (size,width, heigth,channels)
std_tensor = K.std(x,axis=[0,1,2])
x = K.reshape(x, (-1,3)) #shapes of mean and std are (3,) here.
result = (x - mean_tensor) / (std_tensor + K.epsilon())
return K.reshape(result,(-1,width,height,3))
Now, all this is extra calculation in your model and will consume processing.
It's better to just do everything outside the model. Create the preprocessed data first and store it, then create the model without this preprocessing layer. This way you get a faster model. (It can be important if your data or your model is too big).
I'm trying to train an LSTM in Tensorflow using minibatches, but after training is complete I would like to use the model by submitting one example at a time to it. I can set up the graph within Tensorflow to train my LSTM network, but I can't use the trained result afterward in the way I want.
The setup code looks something like this:
#Build the LSTM model.
cellRaw = rnn_cell.BasicLSTMCell(LAYER_SIZE)
cellRaw = rnn_cell.MultiRNNCell([cellRaw] * NUM_LAYERS)
cell = rnn_cell.DropoutWrapper(cellRaw, output_keep_prob = 0.25)
input_data = tf.placeholder(dtype=tf.float32, shape=[SEQ_LENGTH, None, 3])
target_data = tf.placeholder(dtype=tf.float32, shape=[SEQ_LENGTH, None])
initial_state = cell.zero_state(batch_size=BATCH_SIZE, dtype=tf.float32)
with tf.variable_scope('rnnlm'):
output_w = tf.get_variable("output_w", [LAYER_SIZE, 6])
output_b = tf.get_variable("output_b", [6])
outputs, final_state = seq2seq.rnn_decoder(input_list, initial_state, cell, loop_function=None, scope='rnnlm')
output = tf.reshape(tf.concat(1, outputs), [-1, LAYER_SIZE])
output = tf.nn.xw_plus_b(output, output_w, output_b)
...Note the two placeholders, input_data and target_data. I haven't bothered including the optimizer setup. After training is complete and the training session closed, I would like to set up a new session that uses the trained LSTM network whose input is provided by a completely different placeholder, something like:
with tf.Session() as sess:
with tf.variable_scope("simulation", reuse=None):
cellSim = cellRaw
input_data_sim = tf.placeholder(dtype=tf.float32, shape=[1, 1, 3])
initial_state_sim = cell.zero_state(batch_size=1, dtype=tf.float32)
input_list_sim = tf.unpack(input_data_sim)
outputsSim, final_state_sim = seq2seq.rnn_decoder(input_list_sim, initial_state_sim, cellSim, loop_function=None, scope='rnnlm')
outputSim = tf.reshape(tf.concat(1, outputsSim), [-1, LAYER_SIZE])
with tf.variable_scope('rnnlm'):
output_w = tf.get_variable("output_w", [LAYER_SIZE, nOut])
output_b = tf.get_variable("output_b", [nOut])
outputSim = tf.nn.xw_plus_b(outputSim, output_w, output_b)
This second part returns the following error:
tensorflow.python.framework.errors.InvalidArgumentError: You must feed a value for placeholder tensor 'Placeholder' with dtype float
[[Node: Placeholder = Placeholder[dtype=DT_FLOAT, shape=[], _device="/job:localhost/replica:0/task:0/cpu:0"]()]]
...Presumably because the graph I'm using still has the old training placeholders attached to the trained LSTM nodes. What's the right way to 'extract' the trained LSTM and put it into a new, different graph that has a different style of inputs? The Varible scoping features that Tensorflow has seem to address something like this, but the examples in the documentation all talk about using variable scope as a way of managing variable names so that the same piece of code will generate similar subgraphs within the same graph. The 'reuse' feature seems to be close to what I want, but I don't find the Tensorflow documentation linked above to be clear at all on what it does. The cells themselves cannot be given a name (in other words,
cellRaw = rnn_cell.MultiRNNCell([cellRaw] * NUM_LAYERS, name="multicell")
is not valid), and while I can give a name to a seq2seq.rnn_decoder(), I presumably wouldn't be able to remove the rnn_cell.DropoutWrapper() if I used that node unchanged.
Questions:
What is the proper way to move trained LSTM weights from one graph to another?
Is it correct to say that starting a new session "releases resources", but doesn't erase the graph built in memory?
It seems to me like the 'reuse' feature allows Tensorflow to search outside of the current variable scope for variables with the same name (existing in a different scope), and use them in the current scope. Is this correct? If it is, what happens to all of the graph edges from the non-current scope that link to that variable? If it isn't, why does Tensorflow throw an error if you try to have the same variable name within two different scopes? It seems perfectly reasonable to define two variables with identical names in two different scopes, e.g. conv1/sum1 and conv2/sum1.
In my code I'm working within a new scope but the graph won't run without data to be fed into a placeholder from the initial, default scope. Is the default scope always 'in-scope' for some reason?
If graph edges can span different scopes, and names in different scopes can't be shared unless they refer to the exact same node, then that would seem to defeat the purpose of having different scopes in the first place. What am I misunderstanding here?
Thanks!
What is the proper way to move trained LSTM weights from one graph to another?
You can create your decoding graph first (with a saver object to save the parameters) and create a GraphDef object that you can import in your bigger training graph:
basegraph = tf.Graph()
with basegraph.as_default():
***your graph***
traingraph = tf.Graph()
with traingraph.as_default():
tf.import_graph_def(basegraph.as_graph_def())
***your training graph***
make sure you load your variables when you start a session for a new graph.
I don't have experience with this functionality so you may have to look into it a bit more
Is it correct to say that starting a new session "releases resources", but doesn't erase the graph built in memory?
yep, the graph object still hold it
It seems to me like the 'reuse' feature allows Tensorflow to search outside of the current variable scope for variables with the same name (existing in a different scope), and use them in the current scope. Is this correct? If it is, what happens to all of the graph edges from the non-current scope that link to that variable? If it isn't, why does Tensorflow throw an error if you try to have the same variable name within two different scopes? It seems perfectly reasonable to define two variables with identical names in two different scopes, e.g. conv1/sum1 and conv2/sum1.
No, reuse is to determine the behaviour when you use get_variable on an existing name, when it is true it will return the existing variable, otherwise it will return a new one. Normally tensorflow should not throw an error. Are you sure your using tf.get_variable and not just tf.Variable?
In my code I'm working within a new scope but the graph won't run without data to be fed into a placeholder from the initial, default scope. Is the default scope always 'in-scope' for some reason?
I don't really see what you mean. The do not always have to be used. If a placeholder is not required for running an operation you don't have to define it.
If graph edges can span different scopes, and names in different scopes can't be shared unless they refer to the exact same node, then that would seem to defeat the purpose of having different scopes in the first place. What am I misunderstanding here?
I think your understanding or usage of scopes is flawed, see above
I'm currently working on a quaternionic Neural Network using Tensorflow (I want to use GPUs). TensorFlow doesn't have support for quaternions, but you can represent than as a 4x4 real matrix, so it might be possible to build such a neural network in TensorFlow.
Is there a simple way to add a custom operation or to do a custom operation on tensors?
For example, I can write:
output_activation = tf.nn.softmax(tf.matmul(hidden_activation, Weight_to_ouput))
...and that's pretty cool! All you have to do is add a loss function and then do backpropagation. However, I want to do the same thing but with quaternions, for example:
output_activation = mySigmoid(myFunction(hidden_activation, Weight_to_output))
However, I need to transform the quaternions to and from tensors to optimize the GPU calculation. So I need to create a function that gets some tensors as parameters and returns the transformed tensors.
I've looked at py_func, but it seems that you can't return tensors.
I tried the following, but it failed:
def layerActivation(inputTensor,WeightTensor):
newTensor = tf.matmul(inputTensor,WeightTensor)
return newTensor
...and in main():
x = placeholder ...
W_to_hidden = tf.Variable
test = tf.py_func(layerActivation, [x,_W_to_hidden], [tf.float32])
with tf.Session() as sess:
tf.initialize_all_variables().run()
king_return = sess.run(test, feed_dict={x: qtrain})
Error : Unimplemented: Unsupported object type Tensor
Ideally I could use this output_activation in the standard backprop algorithm of TensorFlow but I don't know if it's possible.
Depending on the functionality required, you might be able to implement your operation as a composition of existing TensorFlow ops, without needing to use tf.py_func().
For example, the following works and will run on a GPU:
def layer_activation(input_tensor, weight_tensor):
return tf.matmul(input_tensor, weight_tensor)
# ...
x = tf.placeholder(...)
W_to_hidden = tf.Variable(...)
test = layer_activation(input_tensor, weight_tensor)
# ...
The main reason to use tf.py_func() is if your operations cannot be implemented using TensorFlow operations, and you want to inject some Python code (e.g. using NumPy) that works on the actual values of your tensor.
However, if your mySigmoid() or myFunction() operations cannot be implemented in terms of existing TensorFlow operations, and you want to implement them on GPU, then—as keveman says—you will need to add a new op.
If you want to run your custom operations on GPUs, you have to provide GPU implementation (kernels) in C++. Look at the documentation here for how to extend TensorFlow with custom operations, and especially the section on GPU support.