Related
I'm still working on my understanding of the PyTorch autograd system. One thing I'm struggling at is to understand why .clamp(min=0) and nn.functional.relu() seem to have different backward passes.
It's especially confusing as .clamp is used equivalently to relu in PyTorch tutorials, such as https://pytorch.org/tutorials/beginner/pytorch_with_examples.html#pytorch-nn.
I found this when analysing the gradients of a simple fully connected net with one hidden layer and a relu activation (linear in the outputlayer).
to my understanding the output of the following code should be just zeros. I hope someone can show me what I am missing.
import torch
dtype = torch.float
x = torch.tensor([[3,2,1],
[1,0,2],
[4,1,2],
[0,0,1]], dtype=dtype)
y = torch.ones(4,4)
w1_a = torch.tensor([[1,2],
[0,1],
[4,0]], dtype=dtype, requires_grad=True)
w1_b = w1_a.clone().detach()
w1_b.requires_grad = True
w2_a = torch.tensor([[-1, 1],
[-2, 3]], dtype=dtype, requires_grad=True)
w2_b = w2_a.clone().detach()
w2_b.requires_grad = True
y_hat_a = torch.nn.functional.relu(x.mm(w1_a)).mm(w2_a)
y_a = torch.ones_like(y_hat_a)
y_hat_b = x.mm(w1_b).clamp(min=0).mm(w2_b)
y_b = torch.ones_like(y_hat_b)
loss_a = (y_hat_a - y_a).pow(2).sum()
loss_b = (y_hat_b - y_b).pow(2).sum()
loss_a.backward()
loss_b.backward()
print(w1_a.grad - w1_b.grad)
print(w2_a.grad - w2_b.grad)
# OUT:
# tensor([[ 0., 0.],
# [ 0., 0.],
# [ 0., -38.]])
# tensor([[0., 0.],
# [0., 0.]])
#
The reason is that clamp and relu produce different gradients at 0. Checking with a scalar tensor x = 0 the two versions: (x.clamp(min=0) - 1.0).pow(2).backward() versus (relu(x) - 1.0).pow(2).backward(). The resulting x.grad is 0 for the relu version but it is -2 for the clamp version. That means relu chooses x == 0 --> grad = 0 while clamp chooses x == 0 --> grad = 1.
Problem description
I have inputs x that are indicator variables, and outputs y, where each row is a random one-hot vector that depends on the values of x (data sample shown below).
I want to train a model that essentially learns the probabilistic relationship between x and y in the form of per-column weights. The model must "choose" one, and only one, indicator to output. My current approach is to sample a categorical random variable and produce a one-hot vector as a prediction.
The issue is that I'm getting an error ValueError: An operation has `None` for gradient when I try to train my Keras model.
I find this error odd, because I've trained mixture networks using Keras and Tensorflow, which use tf.contrib.distributions.Categorical, and I did not run into any gradient-related issues.
Code
Experiment
import tensorflow as tf
import tensorflow.contrib.distributions as tfd
import numpy as np
from keras import backend as K
from keras.layers import Layer
from keras.models import Sequential
from keras.utils import to_categorical
def make_xy_prob(rng, size=10000):
rng = np.random.RandomState(rng) if isinstance(rng, int) else rng
cols = 3
weights = np.array([[1, 2, 3]])
# generate data and drop zeros for now
x = rng.choice(2, (size, cols))
is_zeros = x.sum(axis=1) == 0
x = x[~is_zeros]
# use weights to create probabilities for determining y
weighted_x = x * weights
prob_x = weighted_x / weighted_x.sum(axis=1, keepdims=True)
y = np.row_stack([to_categorical(rng.choice(cols, p=p), cols) for p in prob_x])
# add zeros back and shuffle
zeros = np.zeros(((size - len(x), cols)))
x = np.row_stack([x, zeros])
y = np.row_stack([y, zeros])
shuffle_idx = rng.permutation(size)
x = x[shuffle_idx]
y = y[shuffle_idx]
return x, y
class OneHotGate(Layer):
def build(self, input_shape):
self.kernel = self.add_weight(name='kernel', shape=(1, input_shape[1]), initializer='ones')
def call(self, x):
zero_cond = x < 1
x_shape = tf.shape(x)
# weight indicators so that more probability is assigned to more likely columns
weighted_x = x * self.kernel
# fill zeros with -inf so that zero probability is assigned to that column
ninf_fill = tf.fill(x_shape, -np.inf)
masked_x = tf.where(zero_cond, ninf_fill, weighted_x)
onehot_gate = tf.squeeze(tfd.OneHotCategorical(logits=masked_x, dtype=x.dtype).sample(1))
# fill gate with zeros where input was originally zero
zeros_fill = tf.fill(x_shape, 0.0)
masked_gate = tf.where(zero_cond, zeros_fill, onehot_gate)
return masked_gate
def experiment(epochs=10):
K.clear_session()
rng = np.random.RandomState(2)
X, y = make_xy_prob(rng)
input_shape = (X.shape[1], )
model = Sequential()
gate_layer = OneHotGate(input_shape=input_shape)
model.add(gate_layer)
model.compile('adam', 'categorical_crossentropy')
model.fit(X, y, 64, epochs, verbose=1)
Data sample
>>> x
array([[1., 1., 1.],
[0., 1., 0.],
[1., 0., 1.],
...,
[1., 1., 1.],
[1., 1., 1.],
[1., 1., 0.]])
>>> y
array([[0., 0., 1.],
[0., 1., 0.],
[1., 0., 0.],
...,
[0., 0., 1.],
[1., 0., 0.],
[1., 0., 0.]])
Error
ValueError: An operation has `None` for gradient. Please make sure that all of your ops have a gradient defined (i.e. are differentiable). Common ops without gradient: K.argmax, K.round, K.eval.
The problem lies in the fact that in OneHotCategorical performs a discontinuous sampling - what causes gradient computation to fail. In order to replace this discontinuous sampling with a continuous (relaxed) version one may try to use RelaxedOneHotCategorical (which is based on interesting Gumbel Softmax technique).
I want to feed a batch_size integer as a placeholder in Tensorflow. But it does not act as an integer. Consider the following example:
import tensorflow as tf
max_length = 5
batch_size = 3
batch_size_placeholder = tf.placeholder(dtype=tf.int32)
mask_0 = tf.one_hot(indices=[0]*batch_size_placeholder, depth=max_length, on_value=0., off_value=1.)
mask_1 = tf.one_hot(indices=[0]*batch_size, depth=max_length, on_value=0., off_value=1.)
# new session
with tf.Session() as sess:
feed = {batch_size_placeholder : 3}
batch, mask0, mask1 = sess.run([
batch_size_placeholder, mask_0, mask_1
], feed_dict=feed)
When I print the values of batch, mask0 and mask1 I have the following:
print(batch)
>>> array(3, dtype=int32)
print(mask0)
>>> array([[0., 1., 1., 1., 1.]], dtype=float32)
print(mask1)
>>> array([[0., 1., 1., 1., 1.],
[0., 1., 1., 1., 1.],
[0., 1., 1., 1., 1.]], dtype=float32)
Indeed I thought mask0 and mask1 must be the same, but it seems that Tensorflow does not treat batch_size_placeholder as an integer. I believe it would be a tensor, but is there anyway that I can use it as an integer in my computations?
Is there anyway I can fix this problem? Just FYI, I used tf.one_hot as just an example, I want to run train/validation during training in my code where I will need a lot of other computations with different values for batch_size in training and in validation steps.
Any help would be appreciated.
In pure python usage, [0]*3 will be [0,0,0]. However, batch_size_placeholder is a placeholder, during the graph execution, it will be a tensor. [0]*tensor will be regarded as tensor multiplication. In your case, it will be a 1-d tensor which has 0 value. To correctly use batch_size_placeholder, you should create a tensor which has the same length as batch_size_placeholder.
mask_0 = tf.one_hot(tf.zeros(batch_size_placeholder, dtype=tf.int32), depth=max_length, on_value=0., off_value=1.)
It will have the same result as mask_1.
A simple example to show the difference.
batch_size_placeholder = tf.placeholder(dtype=tf.int32)
a = [0]*batch_size_placeholder
b = tf.zeros(batch_size_placeholder, dtype=tf.int32)
with tf.Session() as sess:
print(sess.run([a, b], feed_dict={batch_size_placeholder : 3}))
# [array([0], dtype=int32), array([0, 0, 0], dtype=int32)]
I am trying to implement a OCR project by Keras.So I try to learn from Keras OCR example.I have use my own train data to train a new model and get the .H5 modelfile.
Now I want to test a new image to see my model performance,so I code a
test.py like this:
from keras.models import Model
import cv2
from keras.preprocessing.image import img_to_array
import numpy as np
from keras.models import load_model
from keras import backend as K
from allNumList import alphabet
def labels_to_text(labels):
ret = []
for c in labels:
if c == len(alphabet): # CTC Blank
ret.append("")
else:
ret.append(alphabet[c])
return "".join(ret)
def decode_predict_ctc(out, top_paths = 1):
results = []
beam_width = 5
if beam_width < top_paths:
beam_width = top_paths
for i in range(top_paths):
lables = K.get_value(K.ctc_decode(out, input_length=np.ones(out.shape[0])*out.shape[1],
greedy=False, beam_width=beam_width, top_paths=top_paths)[0][i])[0]
text = labels_to_text(lables)
results.append(text)
return results
def test(modelPath,testPicTest):
img=cv2.imread(testPicTest)
img=cv2.resize(img,(128,64))
img=img_to_array(img)
img=np.array(img,dtype='float')/255.0
img=np.expand_dims(img, axis=0)
img=img.swapaxes(1,2)
model=load_model(modelPath,custom_objects = {'<lambda>': lambda y_true, y_pred: y_pred})
net_out_value = model.predict(img)
top_pred_texts = decode_predict_ctc(net_out_value)
return top_pred_texts
result=test(r'D:\code\testAndExperiment\py\KerasOcr\weights.h5',r'D:\code\testAndExperiment\py\KerasOcr\test\avo.jpg')
print(result)
but I get a error like this:
Error when checking model input: the list of Numpy arrays that you are passing to your model is not the size the model expected. Expected to see 4 array(s), but instead got the following list of 1 arrays: [array([[[[1., 1., 1.], [1., 1., 1.], [1., 1., 1.], ..., [1., 1., 1.], [1., 1., 1.], [1., 1., 1.]], [[1., 1., 1.], [1., 1., 1.],...
I have references some material:
https://stackoverflow.com/a/49537697/10689350
https://www.dlology.com/blog/how-to-train-a-keras-model-to-recognize-variable-length-text/
How to predict the results for OCR using keras image_ocr example?
some answer show that we should use 4 inputs [input_data, labels, input_length, label_length] in training but besides input_data, everything else is information used only for calculating the loss,so in testing maybe use the input_data is enough.So I just use a picture without labels, input_length, label_length.But I get the error above.
I am confused about if the model needs 4 inputs or 1 in testing?
It doesn't seem reasonable to require 4 inputs during the testing process.and now I have model.h5,what should I do next?
Thanks in advance.
My code is Here:https://github.com/hqabcxyxz/KerasOCR/tree/master
maybe I know why.Because in the OCR example,we make a lambda layer to count CTC loss.This Layer need 4 inputs!
The right way to do test is we make a model without this lambda layer during inference.Then load the model weight by name to do inference.After we get inference result,just use CTC decode it!
I will update my code in github later.....
I tried to build a simple MLP with an input layer (2 neurons), a hidden layer (5 neurons) and an output layer (1 neuron). I planned to train and feed it with [[0., 0.], [0., 1.], [1., 0.], [1., 1.]] for getting the desired output of [0., 1., 1., 0.] (elementwise).
Unfortunately my code refuses to run. I keep getting dimensionality errors no matter what I'm trying. Quite frustrating :/ I think I'm missing something but I can not figure out what is wrong.
For better readability I also uploaded the code to a pastebin: code
Any ideas?
import tensorflow as tf
#####################
# preparation stuff #
#####################
# define input and output data
input_data = [[0., 0.], [0., 1.], [1., 0.], [1., 1.]] # XOR input
output_data = [0., 1., 1., 0.] # XOR output
# create a placeholder for the input
# None indicates a variable batch size for the input
# one input's dimension is [1, 2]
n_input = tf.placeholder(tf.float32, shape=[None, 2])
# number of neurons in the hidden layer
hidden_nodes = 5
################
# hidden layer #
################
b_hidden = tf.Variable(0.1) # hidden layer's bias neuron
W_hidden = tf.Variable(tf.random_uniform([hidden_nodes, 2], -1.0, 1.0)) # hidden layer's weight matrix
# initialized with a uniform distribution
hidden = tf.sigmoid(tf.matmul(W_hidden, n_input) + b_hidden) # calc hidden layer's activation
################
# output layer #
################
W_output = tf.Variable(tf.random_uniform([hidden_nodes, 1], -1.0, 1.0)) # output layer's weight matrix
output = tf.sigmoid(tf.matmul(W_output, hidden)) # calc output layer's activation
############
# learning #
############
cross_entropy = tf.nn.sigmoid_cross_entropy_with_logits(output, n_input) # calc cross entropy between current
# output and desired output
loss = tf.reduce_mean(cross_entropy) # mean the cross_entropy
optimizer = tf.train.GradientDescentOptimizer(0.1) # take a gradient descent for optimizing with a "stepsize" of 0.1
train = optimizer.minimize(loss) # let the optimizer train
####################
# initialize graph #
####################
init = tf.initialize_all_variables()
sess = tf.Session() # create the session and therefore the graph
sess.run(init) # initialize all variables
# train the network
for epoch in xrange(0, 201):
sess.run(train) # run the training operation
if epoch % 20 == 0:
print("step: {:>3} | W: {} | b: {}".format(epoch, sess.run(W_hidden), sess.run(b_hidden)))
EDIT: I am still getting errors :/
hidden = tf.sigmoid(tf.matmul(n_input, W_hidden) + b_hidden)
outputs line 27 (...) ValueError: Dimensions Dimension(2) and Dimension(5) are not compatible. Altering the line to:
hidden = tf.sigmoid(tf.matmul(W_hidden, n_input) + b_hidden)
seems to be working, but then the error appears in:
output = tf.sigmoid(tf.matmul(hidden, W_output))
telling me: line 34 (...) ValueError: Dimensions Dimension(2) and Dimension(5) are not compatible
Turning the statement to:
output = tf.sigmoid(tf.matmul(W_output, hidden))
also throws an exception: line 34 (...) ValueError: Dimensions Dimension(1) and Dimension(5) are not compatible.
EDIT2: I do not really understand this. Shouldn't hidden be W_hidden x n_input.T, since in dimensions this would be (5, 2) x (2, 1)? If I transpose n_input hidden is still working (I even don't get the point why it is working without a transpose at all). However, output keeps throwing errors but this operation in dimensions should be (1, 5) x (5, 1)?!
(0) It's helpful to include the error output - it's also a useful thing to look at, because it does identify exactly where you were having shape problems.
(1) The shape errors arose because you have the arguments to matmul backwards in both of your matmuls, and have the tf.Variable backwards. The general rule is that the weights for layer that has input_size, output_size should be [input_size, output_size], and the matmul should be tf.matmul(input_to_layer, weights_for_layer) (and then add the biases, which are of shape [output_size]).
So with your code,
W_hidden = tf.Variable(tf.random_uniform([hidden_nodes, 2], -1.0, 1.0))
should be:
W_hidden = tf.Variable(tf.random_uniform([2, hidden_nodes], -1.0, 1.0))
and
hidden = tf.sigmoid(tf.matmul(W_hidden, n_input) + b_hidden)
should be tf.matmul(n_input, W_hidden); and
output = tf.sigmoid(tf.matmul(W_output, hidden))
should be tf.matmul(hidden, W_output)
(2) Once you've fixed those bugs, your run needs to be fed a feed_dict:
sess.run(train)
should be:
sess.run(train, feed_dict={n_input: input_data})
At least, I presume that this is what you're trying to achieve.