Hello I'm new with TensorFlow and I'd like to concatenate a 2D tensor to a 3D one. I don't know how to do it by exploiting TensorFlow functions.
tensor_3d = [[[1,2], [3,4]], [[5,6], [7,8]]] # shape (2, 2, 2)
tensor_2d = [[10,11], [12,13]] # shape (2, 2)
out: [[[1,2,10,11], [3,4,10,11]], [[5,6,12,13], [7,8,12,13]]] # shape (2, 2, 4)
I would make it work by using loops and new numpy arrays, but in that way I wouldn't use TensorFlow transformations. Any suggestions on how to make this possible? I don't see how transformations like: tf.expand_dims or tf.reshape may help here...
Thanks for sharing your knowledge.
This should do the trick:
import tensorflow as tf
a = tf.constant([[[1,2], [3,4]], [[5,6], [7,8]]])
b = tf.constant([[10,11], [12,13]])
c = tf.expand_dims(b, axis=1) # Add dimension
d = tf.tile(c, multiples=[1,2,1]) # Duplicate in this dimension
e = tf.concat([a,d], axis=-1) # Concatenate on innermost dimension
with tf.Session() as sess:
print(e.eval())
Gives:
[[[ 1 2 10 11]
[ 3 4 10 11]]
[[ 5 6 12 13]
[ 7 8 12 13]]]
There is actually a different trick, that is used from time to time in code bases such as OpenAI's baselines.
Suppose you have two tensors for your gaussian policy. mu and std. The standard deviation has the same shape as mu for batch size 1, but because you use the same parameterized standard deviation for all actions, when the batch size is larger than 1 the two would differ:
mu : Size<batch_size, feat_n>
std: Size<1, feat_n>
In this case a simple thing to do (as what the OpenAI baseline does) is to do:
params = tf.concat([mu, mu * 0 + std])
The zero multiplication casts the std into the same rank as mu.
Enjoy, and good luck training!
ps: numpy and tensorflow's concat operator does not automagically apply broadcasting because according to the maintainers, when the shape of two tensors doesn't match, it is usually the result of a programming error. This is not a big deal in numpy because the computation are evaluated eagerly. But with tensorflow this means that you have to explicitly broadcast the lower rank (or the one that has shape [1, *_]) by hand using the tf.shape operator.
Related
I have two tensors containing batches of matrices of the same batch size (first dimension) but different matrix structure (all other dimensions).
For example A of shape (n,d,d) and B (n,e,e).
Now I would like to build block diagonals of A and B for all n.
So that the output shape (n,(d+e),(d+e)).
Is there an implementation for a problem like this?
I could only find torch.block_diag which is not suited for dimensions higher than 2.
Unfortunately there's no vectorized implementation, you'd have to loop through the batch:
A = torch.rand((2, 2, 2))
B = torch.rand((2, 3, 3))
C = torch.zeros((2, 5, 5))
for i in range(2):
C[i] = torch.block_diag(A[i], B[i])
I was trying to understand how matrix multiplication works over 2 dimensions in DL frameworks and I stumbled upon an article here.
He used Keras to explain the same and it works for him.
But when I try to reproduce the same code in Pytorch, it fails with the error as in the output of the following code
Pytorch Code:
a = torch.ones((2,3,4))
b = torch.ones((7,4,5))
c = torch.matmul(a,b)
print(c.shape)
Output: RuntimeError: The size of tensor a (2) must match the size of tensor b (7) at non-singleton dimension 0
Keras Code:
a = K.ones((2,3,4))
b = K.ones((7,4,5))
c = K.dot(a,b)
print(c.shape)
Output:(2, 3, 7, 5)
Can somebody explain what is it that I'm doing wrong?
Matrix multiplication (aka matrix dot product) is a well defined algebraic operation taking two 2D matrices.
Deep-learning frameworks (e.g., tensorflow, keras, pytorch) are tuned to operate of batches of matrices, hence they usually implement batched matrix multiplication, that is, applying matrix dot product to a batch of 2D matrices.
The examples you linked to show how matmul processes a batch of matrices:
a = tf.ones((9, 8, 7, 4, 2))
b = tf.ones((9, 8, 7, 2, 5))
c = tf.matmul(a, b)
Note how all but last two dimensions are identical ((9, 8, 7)).
This is NOT the case in your example - the leading ("batch") dimensions are different, hence the error.
Using identical leading dimensions in pytorch:
a = torch.ones((2,3,4))
b = torch.ones((2,4,5))
c = torch.matmul(a,b)
print(c.shape)
results with
torch.Size([2, 3, 5])
If you insist on dot products with different batch dimensions, you will have to explicitly define how to multiply the two tensors. You can do that using the very flexible torch.einsum:
a = torch.ones((2,3,4))
b = torch.ones((7,4,5))
c = torch.einsum('ijk,lkm->ijlm', a, b)
print(c.shape)
Resulting with:
torch.Size([2, 3, 7, 5])
I have a batch of matrices A with size torch.Size([batch_size, 9, 5]) and weight matrices B with size torch.Size([3, 5, 6]). In Keras, a simple K.dot(A, B) is able to handle the matrix multiplication to give an output with size (batch_size, 9, 3, 6). Here, each row in A is multiplied to the 3 matrices in B to form a (3x6) matrix.
How do you perform a similar operation in torch. From the documentation, torch.bmm requires that A and B must have the same batch size, so I tried this:
B = B.unsqueeze(0).repeat((batch_size, 1, 1, 1))
B.size() # torch.Size([batch_size, 3, 5, 6])
torch.bmm(A,B) # gives an error
RuntimeError: invalid argument 2: expected 3D tensor, got 4D
Well, the error is expected but how do I perform such an operation?
You can use einstein notation to describe the operation you want as bxy,iyk->bxik. So, you can use einsum to calculate it.
torch.einsum('bxy,iyk->bxik', (A, B)) will give you the answer you want.
I want to multiply stacked matrix which is expressed in tensor form.
tensor.shape == [2,5,7,6]
where 2 and 5 is size of batch,
tensor2.shape == [5,6,8]
where 5 is batch size.
In numpy, tensor2 is automatically broadcasted to [2,5,7,6] tensor
so I can easily use np.matmul(tensor,tensor2)
but in tensorflow, error occurs.
I tried tf.expand_dims(tensor2,0) but this also doesn't work
is there any way to broadcast tensor in tensorflow?
You could use tf.einsum:
tf.einsum('abij,bjk->abik', tensor, tensor2)
Example:
import tensorflow as tf
x = tf.zeros((2, 5, 7, 6))
y = tf.zeros((5, 6, 8))
z = tf.einsum('abij,bjk->abik', x, y)
z.shape.as_list()
# returns [2, 5, 7, 8]
The most general and appropriate way to tackle such problems is to use tf.einsum. This function allows you to directly specify the multiplication rules using Einstein notation which was invented to operate with tensors of arbitrary dimenstions.
I am struggling once again with Python, NumPy and arrays to compute some calculations between matrices.
The code part that is likely not working properly is as follows:
train, test, cv = np.array_split(data, 3, axis = 0)
train_inputs = train[:,: -1]
test_inputs = test[:,: -1]
cv_inputs = cv[:,: -1]
train_outputs = train[:, -1]
test_outputs = test[:, -1]
cv_outputs = cv[:, -1]
When printing those matrices informations (np.ndim, np.shape and dtype respectively), this is what you get:
2
1
2
1
2
1
(94936, 30)
(94936,)
(94936, 30)
(94936,)
(94935, 30)
(94935,)
float64
float64
float64
float64
float64
float64
I believe it is missing 1 dimension in all *_output arrays.
The other matrix I need is created by this command:
newMatrix = neuronLayer(30, 94936)
In which neuronLayer is a class defined as:
class neuronLayer():
def __init__(self, neurons, neuron_inputs):
self.weights = 2 * np.random.random((neuron_inputs, neurons)) - 1
Here's the final output:
outputLayer1 = self.__sigmoid(np.dot(inputs, self.layer1.weights))
ValueError: shapes (94936,30) and (94936,30) not aligned: 30 (dim 1) != 94936 (dim 0)
Python is clearly telling me the matrices are not adding up but I am not understanding where is the problem.
Any tips?
PS: The full code is pasted ħere.
layer1 = neuronLayer(30, 94936) # 29 neurons with 227908 inputs
layer2 = neuronLayer(1, 30) # 1 Neuron with the previous 29 inputs
where `nueronLayer creates
self.weights = 2 * np.random.random((neuron_inputs, neurons)) - 1
the 2 weights are (94936,30) and (30,1) in size.
This line does not make any sense. I surprised it doesn't give an error
layer1error = layer2delta.dot(self.layer2.weights.np.transpose)
I suspect you want np.transpose(self.layer2.weights) or self.layer2.weights.T.
But maybe it doesn't get there. train first calls think with a (94936,30) inputs
outputLayer1 = self.__sigmoid(np.dot(inputs, self.layer1.weights))
outputLayer2 = self.__sigmoid(np.dot(outputLayer1, self.layer2.weights))
So it tries to do a np.dot with 2 (94936,30), (94936,30) arrays. They aren't compatible for a dot. You could transpose one or the other, producing either (94936,94936) array or (30,30). One looks too big. The (30,30) is compatible with the weights for the 2nd layer.
np.dot(inputs.T, self.layer1.weights)
has a chance of working right.
np.dot(outputLayer1, self.layer2.weights)
(30,30) with (30,1) => (30,1)
But then you do
train_outputs - outputLayer2
That will have problems regardless of whether train_outputs is (94936,) or (94936,1)
You need to make sure that arrays shapes flow correctly through the calculation. Don't just check them at the start. Check then internally. And make you sure you understand what shapes they should have at each step.
It would be a whole lot easier to develop and test this code with much smaller inputs and layers, something like 10 samples and 3 features. That way you can look at the values as well as the shapes.
np.dot uses matrix multiplication when its arguments are matrices. It looks like your code is trying to multiply two non-square matrices together with the same dimensions which doesn't work. Perhaps you meant to transpose one of the matrices? Numpy matrices have a T property that returns the transpose, you could try:
self.__sigmoid(np.dot(inputs.T, self.layer1.weights))