Numerical errors in Keras vs Numpy - python

In order to really understand convolutional layers, I have reimplemented the forward method of a single keras Conv2D layer in basic numpy. The outputs of both seam almost identical, but there are some minor differences.
Getting the keras output:
inp = K.constant(test_x)
true_output = model.layers[0].call(inp).numpy()
My output:
def relu(x):
return np.maximum(0, x)
def forward(inp, filter_weights, filter_biases):
result = np.zeros((1, 64, 64, 32))
inp_with_padding = np.zeros((1, 66, 66, 1))
inp_with_padding[0, 1:65, 1:65, :] = inp
for filter_num in range(32):
single_filter_weights = filter_weights[:, :, 0, filter_num]
for i in range(64):
for j in range(64):
prod = single_filter_weights * inp_with_padding[0, i:i+3, j:j+3, 0]
filter_sum = np.sum(prod) + filter_biases[filter_num]
result[0, i, j, filter_num] = relu(filter_sum)
return result
my_output = forward(test_x, filter_weights, biases_weights)
The results are largely the same, but here are some examples of differences:
Mine: 2.6608338356018066
Keras: 2.660834312438965
Mine: 1.7892705202102661
Keras: 1.7892701625823975
Mine: 0.007190803997218609
Keras: 0.007190565578639507
Mine: 4.970898151397705
Keras: 4.970897197723389
I've tried converting everything to float32, but that does not solve it. Any ideas?
Edit:
I plotted the distribution over errors, and it might give some insight into what is happening. As can be seen, the errors all have very similar values, falling into four groups. However, these errors are not exactly these four values, but are almost all unique values around these four peaks.
I am very interested in how to get my implementation to exactly match the keras one. Unfortunately, the errors seem to increase exponentially when implementing multiple layers. Any insight would help me out a lot!

Given how small the differences are, I would say that they are rounding errors.
I recommend using np.isclose (or math.isclose) to check if floats are "equal".

Floating point operations are not commutable. Here is an example:
In [19]: 1.2 - 1.0 - 0.2
Out[19]: -5.551115123125783e-17
In [21]: 1.2 - 0.2 - 1.0
Out[21]: 0.0
So if you want completely identical results, you not only need to do the same computations analytically. But you also need to do them in the exact same order, with the same datatypes and rounding implementation.
To debug this. Start with the Keras code and change it line by line towards your code, until you see a difference.

First thing is to check whether you're using padding='same'. You seem to be using padding same in your implementation.
If you're using other types of padding, including the default which is padding='valid', there will be a difference.
Another possibility is that you may be accumulating errors because of the triple loop of little sums.
You could do it at once and see if it gets different. Compare this implementation with your own, for instance:
def forward2(inp, filter_weights, filter_biases):
#inp: (batch, 64, 64, in)
#w: (3, 3, in, out)
#b: (out,)
padded_input = np.pad(inp, ((0,0), (1,1), (1,1), (0,0))) #(batch, 66, 66, in)
stacked_input = np.stack([
padded_input[:, :-2],
padded_input[:, 1:-1],
padded_input[:, 2: ]], axis=1) #(batch, 3, 64, 64, in)
stacked_input = np.stack([
stacked_input[:, :, :, :-2],
stacked_input[:, :, :, 1:-1],
stacked_input[:, :, :, 2: ]], axis=2) #(batch, 3, 3, 64, 64, in)
stacked_input = stacked_input.reshape((-1, 3, 3, 64, 64, 1, 1))
w = filter_weights.reshape(( 1, 3, 3, 1, 1, 1, 32))
b = filter_biases.reshape (( 1, 1, 1, 32))
result = stacked_input * w #(-1, 3, 3, 64, 64, 1, 32)
result = result.sum(axis=(1,2,-2)) #(-1, 64, 64, 32)
result += b
result = relu(result)
return result
A third possibility is to check whether you're using GPU and switch everything to CPU for test. Some algorithms for GPU are even non-deterministic.
For any kernel size:
def forward3(inp, filter_weights, filter_biases):
inShape = inp.shape #(batch, imgX, imgY, ins)
wShape = filter_weights.shape #(wx, wy, ins, out)
bShape = filter_biases.shape #(out,)
ins = inShape[-1]
out = wShape[-1]
wx = wShape[0]
wy = wShape[1]
imgX = inShape[1]
imgY = inShape[2]
assert imgX >= wx
assert imgY >= wy
assert inShape[-1] == wShape[-2]
assert bShape[-1] == wShape[-1]
#you may need to invert this padding, exchange L with R
loseX = wx - 1
padXL = loseX // 2
padXR = padXL + (1 if loseX % 2 > 0 else 0)
loseY = wy - 1
padYL = loseY // 2
padYR = padYL + (1 if loseY % 2 > 0 else 0)
padded_input = np.pad(inp, ((0,0), (padXL,padXR), (padYL,padYR), (0,0)))
#(batch, paddedX, paddedY, in)
stacked_input = np.stack([padded_input[:, i:imgX + i] for i in range(wx)],
axis=1) #(batch, wx, imgX, imgY, in)
stacked_input = np.stack([stacked_input[:,:,:,i:imgY + i] for i in range(wy)],
axis=2) #(batch, wx, wy, imgX, imgY, in)
stacked_input = stacked_input.reshape((-1, wx, wy, imgX, imgY, ins, 1))
w = filter_weights.reshape(( 1, wx, wy, 1, 1, ins, out))
b = filter_biases.reshape(( 1, 1, 1, out))
result = stacked_input * w
result = result.sum(axis=(1,2,-2))
result += b
result = relu(result)
return result

Related

How unfold operation works in pytorch with dilation and stride?

In my case I am applying this unfold operation on a tensor of A as given below:
A.shape=torch.Size([16, 309,128])
A = A.unsqueeze(1) # that's I guess for making it 4 dim for unfold operation
A_out= F.unfold(A, (7, 128), stride=(1,128),dilation=(3,1))
A_out.shape=torch.Size([16, 896,291])
I am not getting this 291. If the dilation factor is not there, it would be [16,896,303] right?
But if dialtion=3 then it's 291 how? Also here stride is not mentioned so deafualt is 1 but what if it is also mentioned like 4. Please guide.
Also here stride is not mentioned so default is 1 but what if it is
also mentioned like 4.
Your code already has stride=(1,128). If stride is only set to 4 it will be used like (4,4) in this case. This can be easily verified with formula below.
If the dilation factor is not there, it would be [16,896,303] right?
Yes. Example below.
But if dialtion=3 then it's 291 how?
Following the formula given in pytorch docs it comes to 291. After doing A.unsqueeze(1) the shape becomes, [16, 1, 309, 128]. Here, N=16, C=1, H=309, W=128.
The output dimension is, (N, C * product(kernel_size), L). With kernel_size=(7,128) So this becomes, (16, 1 * 7 * 128, L) = (16, 896, L).
L can be calculated using the formula below with multiplication over each dimension.
L = d3 * d4
Over height dimension spatial_size[3] = 309, padding[3] = 0 default, dilation[3] = 3, kernel_size[3] = 7, stride[3] = 1.
d3 = (309 + 2 * 0 - 3 * (7 - 1) - 1) / 1 + 1
= 291
Over width dimension spatial_size[4] = 128, padding[4] = 0 default, dilation[4] = 1, kernel_size[4] = 128, stride[4] = 128.
d4 = (128 + 2 * 0 - 1 * (128 - 1) - 1) / 128 + 1
= 1
So, using above formula L becomes 291.
Code
import torch
from torch.nn import functional as F
A = torch.randn([16, 309,128])
print(A.shape)
A = A.unsqueeze(1)
print(A.shape)
A_out= F.unfold(A, kernel_size=(7, 128), stride=(1,128),dilation=(3,1))
print(A_out.shape)
Output
torch.Size([16, 309, 128])
torch.Size([16, 1, 309, 128])
torch.Size([16, 896, 291])
Links
https://pytorch.org/docs/stable/generated/torch.nn.Unfold.html
https://pytorch.org/docs/stable/generated/torch.nn.functional.unfold.html

Allow or prohibit permutation in tensorflow loss function?

I'm training a network to reconstruct coordinates of specific structures in an image. Until now, my loss function contains three 2D vectors (i.e. 6 variables) for the coordinates which are learned via MSE, and three corresponding classifiers learned via SigmoidFocalCrossEntropy, indicating if there are 0, 1, 2 or 3 of these structures present. I thought it might be beneficial to give tensorflow the information that it is neglectable in which order that the vectors are reconstructed as long as the classifier is still correct. Simple example:
loss(tf.constant([[30, 20, 15, 7, 0, 0, 1, 1, 0]], dtype=tf.float32),
tf.constant([[0, 0, 15, 7, 30, 20, 0, 1, 1]], dtype=tf.float32)) == 0
To implement this I used tf.argsort on the magnitude of each vector:
def sort(tensor):
x = tf.unstack(tensor, axis=-1)
squ = []
for i in range(len(x) // 2):
i *= 2
squ.append(x[i] ** 2 + x[i+1] ** 2)
new = tf.stack(squ, axis=-1)
return tf.argsort(new, axis=-1, direction='ASCENDING',
stable=False, name=None)
and consecutively permuted the tensor:
def permute_tensor_structure(tensor, indices):
c = indices + 6
x = indices * 2
y = indices * 2 + 1
v = tf.transpose([x, y])
v = tf.reshape(v, [indices.shape[0], -1])
perm = tf.concat([v, c], axis=-1)
return tf.gather(tensor, perm, batch_dims=1, name=None, axis=-1)
I did the same for my ground truth and got the network up and running.
Minimal example extracted from my code:
import tensorflow as tf
from tensorflow_addons.losses import SigmoidFocalCrossEntropy
def compute_permute_loss(truth, predict):
l2 = tf.keras.losses.MeanSquaredError()
ce = SigmoidFocalCrossEntropy()
indices = sort(predict[:, 0:6])
indices2 = sort(truth[:, 0:6])
predict = permute_tensor_structure(predict, indices)
truth = permute_tensor_structure(truth, indices2)
L2 = l2(predict[:, 0:6], truth[:, 0:6])
BCE = tf.reduce_sum(ce(truth[:, 6:], predict[:, 6:],))
return 3 * L2 + BCE
class Test(tf.test.TestCase):
def test_permutation_loss(self):
tensor1 = tf.constant(
[[30, 20, 15, 7, 0, 0, 1, 1, 0]],
dtype=tf.float32)
tensor2 = permute_tensor_structure(tensor1, tf.constant([[2, 1, 0]]))
loss = compute_permute_loss(tensor1, tensor2)
self.assertEqual(loss, 0,
msg="loss for only permuted tensors is not zero")
tensor3 = tf.constant(
[[29, 22, 15, 7, 0, 0, 1, 1, 0]],
dtype=tf.float32)
loss = compute_permute_loss(tensor3, tensor2)
self.assertAllClose(loss, (1.0 + 4.0) / 2.0,
msg="loss for values is not rmse")
if __name__ == "__main__":
tf.test.main()
# [ RUN ] Test.test_permutation_loss
# [ OK ] Test.test_permutation_loss
However, I'm afraid the permutation in the loss could backfire and impair tensorflows backpropagation. Maybe somebody already faced a similar problem? Or has deeper knowledge on tensorflows graph building and back propagation? I would be grateful for every suggestion or input.

How to convert numpy code into tensorflow?

I am trying to convert numpy code into tensorflow graph format. But somewhere I am missing an understanding of dimensionality.
Here is numpy code:
def delta_to_boxes3d(deltas, anchors, coordinate='lidar'):
# Input:
# deltas: (N, w, l, 14)(200,240,14)
# feature_map_shape: (w, l)
# anchors: (w, l, 2, 7)(200,240,2,7)
# Ouput:
# boxes3d: (N, w*l*2, 7)
#anchros shape 9200,240,2,7)
anchors_reshaped = anchors.reshape(-1, 7) #(96000,7)
deltas = deltas.reshape(-1, 7) #(96000,7)
anchors_d = np.sqrt(anchors_reshaped[:, 4]**2 + anchors_reshaped[:, 5]**2)
boxes3d = np.zeros_like(deltas)
boxes3d[..., [0, 1]] = deltas[..., [0, 1]] * \
anchors_d[:, np.newaxis] + anchors_reshaped[..., [0, 1]] #in this line I have the problem
boxes3d[..., [2]] = deltas[..., [2]] * \
1.73 + anchors_reshaped[..., [2]] #ANCHOR_H = 1.73
boxes3d[..., [3, 4, 5]] = np.exp(
deltas[..., [3, 4, 5]]) * anchors_reshaped[..., [3, 4, 5]]
boxes3d[..., 6] = deltas[..., 6] + anchors_reshaped[..., 6]
return boxes3d
Here is the code which I have been trying:
def delta_boxes3d():
anchors = tf.placeholder(tf.float32,shape=[None,None,2,7],name="anchor") #check the anchor type later
anchors_reshaped = tf.reshape(anchors,shape=[96000,7])
delta = tf.placeholder(tf.float32,shape=[None,None,14],name="delta")
anchors_d = tf.sqrt(tf.add(tf.pow(anchors_reshaped[:,4],2),tf.pow(anchors_reshaped[:,5],2))) #96000
deltas = tf.reshape(delta,[96000,7])
x_shape = tf.shape(deltas)
boxes3d_ = tf.multiply(deltas[:,0:2],tf.add(tf.expand_dims(anchors_d,-1),anchors_reshaped[:,0:2]))
boxes3d = tf.ones(x_shape[:-1]) + boxes3d_
elta_ = np.random.rand(200,240,14)
anchor_ = np.random.rand(200,240,2,7)
sess = tf.Session()
sess.run(tf.global_variables_initializer())
result = sess.run(boxes3d1,feed_dict={anchors:anchor_,delta:delta_}) #(96000,7) #need to get boxes3d
print(result.shape)
I am getting below error:
ValueError: Dimensions must be equal, but are 96000 and 2 for '{{node add_2}} = AddV2[T=DT_FLOAT](ones, Mul)' with input shapes: [96000], [96000,2].
Could someone help me with this?
Thanks in advance
The error comes from the line boxes3d = tf.ones(x_shape[:-1]) + boxes3d_.
You are trying to add shapes (96000,) and (96000,2), which you can't without expanding dims. If you want to add a scalar, you can do boxes3d = 1 + boxes3d.
In the example above, you want to do multiplication by scalar followed by addition.
Note that in the NumPy example in the highlighted line, you do the multiplication first and then addition. In TensorFlow, you made it the other way around (possibly by mistake).
I rewrote your NumPy example to Tensorflow 2 so that both functions return the same output.
def delta_boxes3d(deltas, anchors):
deltas = tf.constant(deltas)
anchors = tf.constant(anchors)
anchors_reshaped = tf.reshape(anchors, shape=[96000, 7])
anchors_d = tf.sqrt(tf.add(tf.pow(anchors_reshaped[:, 4], 2), tf.pow(anchors_reshaped[:, 5], 2))) # 96000
deltas = tf.reshape(deltas, [96000, 7])
boxes3d_01 = tf.add(tf.multiply(deltas[:, 0:2], tf.expand_dims(anchors_d, -1)), anchors_reshaped[:, 0:2])
boxes3d_2 = deltas[..., 2:3] * 1.73 + anchors_reshaped[..., 2:3]
boxes3d_345 = tf.exp(deltas[..., 3:6]) * anchors_reshaped[..., 3:6]
boxes3d_6 = deltas[..., 6:7] + anchors_reshaped[..., 6:7]
boxes3d = tf.concat([boxes3d_01, boxes3d_2, boxes3d_345, boxes3d_6], axis=-1)
return boxes3d
deltas = np.random.rand(200, 240, 14)
anchors = np.random.rand(200, 240, 2, 7)
print(delta_to_boxes3d(deltas, anchors))
print(delta_boxes3d(deltas, anchors))
You can notice I created smaller arrays first, and then I concatenated them. This is because Tensorflow won't allow me to modify EagerTensors.
Notice the difference between deltas[..., 2] and deltas[..., 2:3]. The second one doesn't reduce the last dimension. They return shapes (96000,), and (96000,1) respectively.

How to vectorize the sum? tensor[i,:,:,:] + tensor[i]

I want to vectorize the following code:
def style_noise(self, y, style):
n = torch.randn(y.shape)
for i in range(n.shape[0]):
n[i] = (n[i] - n.mean(dim=(1, 2, 3))[i]) * style.std(dim=(1, 2, 3))[i] / n.std(dim=(1, 2, 3))[i] + style.mean(dim=(1, 2, 3))[i]
noise = Variable(n, requires_grad=False).to(y.device)
return noise
I didn't find a way nice way of doing so.
y and style are 4d tensors, say style.shape = y.shape = [64, 3, 128, 128].
I want to return the noise tensor, noise.shape = [64, 3, 128, 128].
Please let me know in the comments if the question is not clear.
Your use case is exactly why the .mean and .std methods come with a keepdim parameter. You can make use of this to enable broadcasting semantics to vectorize things for you:
def style_noise(self, y, style):
n = torch.randn(y.shape)
n_mean = n.mean(dim=(1, 2, 3), keepdim=True)
n_std = n.std(dim=(1, 2, 3), keepdim=True)
style_mean = style.mean(dim=(1, 2, 3), keepdim=True)
style_std = style.std(dim=(1, 2, 3), keepdim=True)
n = (n - n_mean) * style_std / n_std + style_mean
noise = Variable(n, requires_grad=False).to(y.device)
return noise
To calculate mean and std for the whole tensor you set no arguments
m = t.mean(); print(m) # if you don't set the dim for the whole tensor
s = t.std(); print(s) # if you don't set the dim for the whole tensor
Then if your shape is 2,2,2 for instance, create tensors for broadcasting subtract and division.
ss = torch.empty(2,2,2).fill_(s)
print(ss)
mm = torch.empty(2,2,2).fill_(m)
print(mm)
At the moment keepdim is not working as expected when you don't set the dim.
m = t.mean(); print(m) # for the whole tensor
s = t.std(); print(s) # for the whole tensor
m = t.mean(dim=0); print(m) # 0 means columns mean
s = t.std(dim=0); print(s) # 0 means columns mean
m = t.mean(dim=1); print(m) # 1 means rows mean
s = t.std(dim=1); print(s) # 1 means rows mean
s = t.mean(keepdim=True);print(s) # will not work
m = t.std(keepdim=True);print(m) # will not work
If you set a dim as a tuple, then it will return mean for axes, you asked not for the whole.

Theano/numpy advanced indexing

I have a 4d theano tensor (with the shape (1, 700, 16, 95000) for example) and a 4d 'mask' tensor with the shape (1, 700, 16, 1024) such that every element in the mask is an index that I need from the original tensor. How can I use my mask to index my tensor? Things like sample[mask] or sample[:, :, :, mask] don't really seem to work.
I also tried using a binary mask but since the tensor is rather large I get a 'device out of memory' exception.
Other ideas on how to get my indices from the tensor would also be very appreciated.
Thanks
So in the lack of an answer, I've decided to use the more computationally intensive solution which is unfolding both my data the the indices tensors, adding an offset to the indices to bring them to global positions, indexing the data and reshaping it back to original.
I'm adding here my test code, including a (commented-out) solution for matrices.
def theano_convertion(els, inds, offsets):
els = T.flatten(els)
inds = T.flatten(inds) + offsets
return T.reshape(els[inds], (2, 3, 16, 5))
if __name__ == '__main__':
# command: np.transpose(t[range(2), indices])
# t = np.random.randint(0, 10, (2, 20))
# indices = np.random.randint(0, 10, (5, 2))
t = np.random.randint(0, 10, (2, 3, 16, 20)).astype('int32')
indices = np.random.randint(0, 10, (2, 3, 16, 5)).astype('int32')
offsets = np.asarray(range(1, 2 * 3 * 16 + 1), dtype='int32')
offsets = (offsets * 20) - 20
offsets = np.repeat(offsets, 5)
offsets_tens = T.ivector('offsets')
inds_tens = T.itensor4('inds')
t_tens = T.itensor4('t')
func = theano.function(
[t_tens, inds_tens, offsets_tens],
[theano_convertion(t_tens, inds_tens, offsets_tens)]
)
shaped_elements = []
flattened_elements = []
[tmp] = func(t, indices, offsets)
for i in range(2):
for j in range(3):
for k in range(16):
shaped_elements.append(t[i, j, k, indices[i, j, k, :]])
flattened_elements.append(tmp[i, j, k, :])
print shaped_elements[-1] == flattened_elements[-1]

Categories

Resources