I am trying to carry out tensor multiplication in NumPy/Tensorflow.
I have 3 tensors- A (M X h), B (h X N X s), C (s X T).
I believe that A X B X C should produce a tensor D (M X N X T).
Here's the code (using both numpy and tensorflow).
M = 5
N = 2
T = 3
h = 2
s = 3
A_np = np.random.randn(M, h)
C_np = np.random.randn(s, T)
B_np = np.random.randn(h, N, s)
A_tf = tf.Variable(A_np)
C_tf = tf.Variable(C_np)
B_tf = tf.Variable(B_np)
# Tensorflow
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
print sess.run(A_tf)
p = tf.matmul(A_tf, B_tf)
sess.run(p)
This returns the following error:
ValueError: Shape must be rank 2 but is rank 3 for 'MatMul_2' (op: 'MatMul') with input shapes: [5,2], [2,2,3].
If we try the multiplication only with numpy matrices, we get the following errors:
np.multiply(A_np, B_np)
ValueError: operands could not be broadcast together with shapes (5,2) (2,2,3)
However, we can use np.tensordot as follows:
np.tensordot(np.tensordot(A_np, B_np, axes=1), C_np, axes=1)
Is there an equivalent operation in TensorFlow?
Answer
In numpy, we would do as follows:
ABC_np = np.tensordot(np.tensordot(A_np, B_np, axes=1), C_np, axes=1)
In tensorflow, we would do as follows:
AB_tf = tf.tensordot(A_tf, B_tf,axes = [[1], [0]])
AB_tf_C_tf = tf.tensordot(AB_tf, C_tf, axes=[[2], [0]])
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
ABC_tf = sess.run(AB_tf_C_tf)
np.allclose(ABC_np, ABC_tf) return True.
Try
tf.tensordot(A_tf, B_tf,axes = [[1], [0]])
For example:
x=tf.tensordot(A_tf, B_tf,axes = [[1], [0]])
x.get_shape()
TensorShape([Dimension(5), Dimension(2), Dimension(3)])
Here is tensordot documentation, and here is the relevant github repository.
Related
I try to reproduce results generated by the LSTMCell from TensorFlow to be sure that I know what it does.
Here is my TensorFlow code:
num_units = 3
lstm = tf.nn.rnn_cell.LSTMCell(num_units = num_units)
timesteps = 7
num_input = 4
X = tf.placeholder("float", [None, timesteps, num_input])
x = tf.unstack(X, timesteps, 1)
outputs, states = tf.contrib.rnn.static_rnn(lstm, x, dtype=tf.float32)
sess = tf.Session()
init = tf.global_variables_initializer()
sess.run(init)
x_val = np.random.normal(size = (1, 7, num_input))
res = sess.run(outputs, feed_dict = {X:x_val})
for e in res:
print e
Here is its output:
[[-0.13285545 -0.13569424 -0.23993783]]
[[-0.04818152 0.05927373 0.2558436 ]]
[[-0.13818116 -0.13837864 -0.15348436]]
[[-0.232219 0.08512601 0.05254192]]
[[-0.20371495 -0.14795329 -0.2261929 ]]
[[-0.10371902 -0.0263292 -0.0914975 ]]
[[0.00286371 0.16377522 0.059478 ]]
And here is my own implementation:
n_steps, _ = X.shape
h = np.zeros(shape = self.hid_dim)
c = np.zeros(shape = self.hid_dim)
for i in range(n_steps):
x = X[i,:]
vec = np.concatenate([x, h])
#vec = np.concatenate([h, x])
gs = np.dot(vec, self.kernel) + self.bias
g1 = gs[0*self.hid_dim : 1*self.hid_dim]
g2 = gs[1*self.hid_dim : 2*self.hid_dim]
g3 = gs[2*self.hid_dim : 3*self.hid_dim]
g4 = gs[3*self.hid_dim : 4*self.hid_dim]
I = vsigmoid(g1)
N = np.tanh(g2)
F = vsigmoid(g3)
O = vsigmoid(g4)
c = c*F + I*N
h = O * np.tanh(c)
print h
And here is its output:
[-0.13285543 -0.13569425 -0.23993781]
[-0.01461723 0.08060743 0.30876374]
[-0.13142865 -0.14921292 -0.16898363]
[-0.09892188 0.11739943 0.08772941]
[-0.15569218 -0.15165766 -0.21918869]
[-0.0480604 -0.00918626 -0.06084118]
[0.0963612 0.1876516 0.11888081]
As you might notice I was able to reproduce the first hidden vector, but the second one and all the following ones are different. What am I missing?
i examined this link and your code is almost perfect but you forgot to add forget_bias value(default 1.0) in this line F = vsigmoid(g3) its actualy F = vsigmoid(g3+self.forget_bias) or in your case its 1 F = vsigmoid(g3+1)
here is my imp with numpy:
import numpy as np
import tensorflow as tf
num_units = 3
lstm = tf.nn.rnn_cell.LSTMCell(num_units = num_units)
batch=1
timesteps = 7
num_input = 4
X = tf.placeholder("float", [batch, timesteps, num_input])
x = tf.unstack(X, timesteps, 1)
outputs, states = tf.contrib.rnn.static_rnn(lstm, x, dtype=tf.float32)
sess = tf.Session()
init = tf.global_variables_initializer()
sess.run(init)
x_val = np.reshape(range(28),[batch, timesteps, num_input])
res = sess.run(outputs, feed_dict = {X:x_val})
for e in res:
print(e)
print("\nmy imp\n")
#my impl
def sigmoid(x):
return 1/(1+np.exp(-x))
kernel,bias=sess.run([lstm._kernel,lstm._bias])
f_b_=lstm._forget_bias
c,h=np.zeros([batch,num_input-1]),np.zeros([batch,num_input-1])
for step in range(timesteps):
inpt=np.split(x_val,7,1)[step][0]
lstm_mtrx=np.matmul(np.concatenate([inpt,h],1),kernel)+bias
i,j,f,o=np.split(lstm_mtrx,4,1)
c=sigmoid(f+f_b_)*c+sigmoid(i)*np.tanh(j)
h=sigmoid(o)*np.tanh(c)
print(h)
output:
[[ 0.06964055 -0.06541953 -0.00682676]]
[[ 0.005264 -0.03234607 0.00014838]]
[[ 1.617855e-04 -1.316892e-02 8.596722e-06]]
[[ 3.9425286e-06 -5.1347450e-03 7.5078127e-08]]
[[ 8.7508155e-08 -1.9560163e-03 6.3853928e-10]]
[[ 1.8867894e-09 -7.3784427e-04 5.8551406e-12]]
[[ 4.0385355e-11 -2.7728223e-04 5.3957669e-14]]
my imp
[[ 0.06964057 -0.06541953 -0.00682676]]
[[ 0.005264 -0.03234607 0.00014838]]
[[ 1.61785520e-04 -1.31689185e-02 8.59672610e-06]]
[[ 3.94252745e-06 -5.13474567e-03 7.50781122e-08]]
[[ 8.75080644e-08 -1.95601574e-03 6.38539112e-10]]
[[ 1.88678843e-09 -7.37844070e-04 5.85513438e-12]]
[[ 4.03853841e-11 -2.77282006e-04 5.39576024e-14]]
Tensorflow uses glorot_uniform() function to initialize the lstm kernel, which samples weights from a random uniform distribution. We need to fix a value for the kernel to get reproducible results:
import tensorflow as tf
import numpy as np
np.random.seed(0)
timesteps = 7
num_input = 4
x_val = np.random.normal(size = (1, timesteps, num_input))
num_units = 3
def glorot_uniform(shape):
limit = np.sqrt(6.0 / (shape[0] + shape[1]))
return np.random.uniform(low=-limit, high=limit, size=shape)
kernel_init = glorot_uniform((num_input + num_units, 4 * num_units))
My implementation of the LSTMCell (well, actually it's just slightly rewritten tensorflow's code):
def sigmoid(x):
return 1. / (1 + np.exp(-x))
class LSTMCell():
"""Long short-term memory unit (LSTM) recurrent network cell.
"""
def __init__(self, num_units, initializer=glorot_uniform,
forget_bias=1.0, activation=np.tanh):
"""Initialize the parameters for an LSTM cell.
Args:
num_units: int, The number of units in the LSTM cell.
initializer: The initializer to use for the kernel matrix. Default: glorot_uniform
forget_bias: Biases of the forget gate are initialized by default to 1
in order to reduce the scale of forgetting at the beginning of
the training.
activation: Activation function of the inner states. Default: np.tanh.
"""
# Inputs must be 2-dimensional.
self._num_units = num_units
self._forget_bias = forget_bias
self._activation = activation
self._initializer = initializer
def build(self, inputs_shape):
input_depth = inputs_shape[-1]
h_depth = self._num_units
self._kernel = self._initializer(shape=(input_depth + h_depth, 4 * self._num_units))
self._bias = np.zeros(shape=(4 * self._num_units))
def call(self, inputs, state):
"""Run one step of LSTM.
Args:
inputs: input numpy array, must be 2-D, `[batch, input_size]`.
state: a tuple of numpy arrays, both `2-D`, with column sizes `c_state` and
`m_state`.
Returns:
A tuple containing:
- A `2-D, [batch, output_dim]`, numpy array representing the output of the
LSTM after reading `inputs` when previous state was `state`.
Here output_dim is equal to num_units.
- Numpy array(s) representing the new state of LSTM after reading `inputs` when
the previous state was `state`. Same type and shape(s) as `state`.
"""
num_proj = self._num_units
(c_prev, m_prev) = state
input_size = inputs.shape[-1]
# i = input_gate, j = new_input, f = forget_gate, o = output_gate
lstm_matrix = np.hstack([inputs, m_prev]).dot(self._kernel)
lstm_matrix += self._bias
i, j, f, o = np.split(lstm_matrix, indices_or_sections=4, axis=0)
# Diagonal connections
c = (sigmoid(f + self._forget_bias) * c_prev + sigmoid(i) *
self._activation(j))
m = sigmoid(o) * self._activation(c)
new_state = (c, m)
return m, new_state
X = x_val.reshape(x_val.shape[1:])
cell = LSTMCell(num_units, initializer=lambda shape: kernel_init)
cell.build(X.shape)
state = (np.zeros(num_units), np.zeros(num_units))
for i in range(timesteps):
x = X[i,:]
output, state = cell.call(x, state)
print(output)
Produces output:
[-0.21386017 -0.08401277 -0.25431477]
[-0.22243588 -0.25817422 -0.1612211 ]
[-0.2282134 -0.14207162 -0.35017249]
[-0.23286737 -0.17129192 -0.2706512 ]
[-0.11768674 -0.20717363 -0.13339118]
[-0.0599215 -0.17756104 -0.2028935 ]
[ 0.11437953 -0.19484555 0.05371994]
While your Tensorflow code, if you replace the second line with
lstm = tf.nn.rnn_cell.LSTMCell(num_units = num_units, initializer = tf.constant_initializer(kernel_init))
returns:
[[-0.2138602 -0.08401276 -0.25431478]]
[[-0.22243595 -0.25817424 -0.16122109]]
[[-0.22821338 -0.1420716 -0.35017252]]
[[-0.23286738 -0.1712919 -0.27065122]]
[[-0.1176867 -0.2071736 -0.13339119]]
[[-0.05992149 -0.177561 -0.2028935 ]]
[[ 0.11437953 -0.19484554 0.05371996]]
Here is a blog which will answer any conceptual questions related to LSTM's. Seems that there is a lot which goes into building an LSTM from scratch!
Of course, this answer doesn't solve your question but just giving a direction.
Considering Linear Algebra, it's possible to exist a dimension mismatch in the matrix multiplication between I*N (red circle), affecting the output, given that n x m dot m x p will give you a n x p dimensional output.
Let x be an tensor of filters of size (n, w, h, c).
I want to apply the tf.nn.softmax() function to every filter in that tensor. How can I do that?
I tried the following but got an error:
import tensorflow as tf
import numpy as np
n, c = 2, 2
h, w = 2, 2
x = tf.ones([n, h, w, c])
y = tf.nn.softmax(x, axis=[1,2])
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
print(x)
print("x", sess.run(x))
print("\n")
print(y)
print("y", sess.run(y))
After the operation, I would expect every filter to be
0.25 0.25
0.25 0.25
Here is my solution:
Reshape x as follows:
x_r = tf.reshape(x, [n, -1, c])
Apply softmax to filter dimension:
y_r = tf.nn.softmax(x_r, axis=1)
Recover the original shape:
y = tf.reshape(y_r, [n, h, w, c])
So, i want to multiply a matrix with a matrix. When I try an array with a matrix, it works:
import tensorflow as tf
x = tf.placeholder(tf.float32, [None, 3])
W = tf.Variable(tf.ones([3, 3]))
y = tf.matmul(x, W)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
curr_y = sess.run(y, feed_dict={x: [[1,2,3],[0,4,5]]})
print curr_y
So the array has the batch size 2 and shape 3x1. So I can multiply the matrix with shape 3x3 with the array 3x1.
But when I have again a matrix with the shape 3x3, but this time a matrix and not an array with the shape 3x2, with batch size 2, its not working.
But if I try to multiply a matrix with a matrix. It doesn't work.
import tensorflow as tf
x = tf.placeholder(tf.float32, [None, 3,3])
W = tf.Variable(tf.ones([3, 3]))
y = tf.matmul(x, W)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
curr_y = sess.run(y, feed_dict={x: [[[1,2,3],[1,2,3]],[[1,1,4],[0,4,5]]]})
print curr_y
ValueError: Shape must be rank 2 but is rank 3 for 'MatMul' (op:
'MatMul') with input shapes: [?,3,3], [3,3].
########EDIT
Sorry, what I want to do, is, to matmul a matrix to a batch of matrix or arrays. So I dont want to do
y = tf.matmul(x, W)
actually, I want to do
y = tf.matmul(W, x)
Your input to tensor 'x' has a shape (2, 2, 3).
You're trying to do matrix multiplication of (2, 2, 3) and (3, 3). they don't have the same rank, and that's the reason for the error.
from Tensorflow official site:
https://www.tensorflow.org/api_docs/python/tf/matmul
Args:
a: Tensor of type float16, float32, float64, int32, complex64, complex128 and rank > 1.
b: Tensor with same type and rank as a.
When you do matrices multiplication, the shape of the matrices need to follow the rule
(a, b) * (b, c) = (a, c)
Keep in mind the shape of W as you defined is (3, 3).
This feed_dict={x: [[1,2,3],[0,4,5]]} is a 2D array, the shape of it is (2, 3)
In [67]: x = [[1, 2, 3], [0, 4, 5]]
In [68]: x = np.array(x)
In [69]: x.shape
Out[69]: (2, 3)
It follows the rule (2, 3) * (3, 3) => (2, 3)
But your second example, the shape doesn't follow the rule of multiplication. The shape of your input is (2, 2, 3) which is not even in the same dimension as your defined W, so it won't work.
In [70]: foo = [[[1,2,3],[1,2,3]],[[1,1,4],[0,4,5]]]
In [71]: foo = np.array(foo)
In [72]: foo.shape
Out[72]: (2, 2, 3)
I have two tensors in tensorflow, the first tensor is 3-D, and the second is 2D. And I want to multiply them like this:
x = tf.placeholder(tf.float32, shape=[sequence_length, batch_size, hidden_num])
w = tf.get_variable("w", [hidden_num, 50])
b = tf.get_variable("b", [50])
output_list = []
for step_index in range(sequence_length):
output = tf.matmul(x[step_index, :, :], w) + b
output_list.append(output)
output = tf.pack(outputs_list)
I use a loop to do multiply operation, but I think it is too slow. What would be the best way to make this process as simple/clean as possible?
You could use batch_matmul. Unfortunately it doesn't seem batch_matmul supports broadcasting along the batch dimension, so you have to tile your w matrix. This will use more memory, but all operations will stay in TensorFlow
a = tf.ones((5, 2, 3))
b = tf.ones((3, 1))
b = tf.reshape(b, (1, 3, 1))
b = tf.tile(b, [5, 1, 1])
c = tf.batch_matmul(a, b) # use tf.matmul in TF 1.0
sess = tf.InteractiveSession()
sess.run(tf.shape(c))
This gives
array([5, 2, 1], dtype=int32)
You could use map_fn, which scans a function along the first dimension.
x = tf.placeholder(tf.float32, shape=[sequence_length, batch_size, hidden_num])
w = tf.get_variable("w", [hidden_num, 50])
b = tf.get_variable("b", [50])
def mul_fn(current_input):
return tf.matmul(current_input, w) + b
output = tf.map_fn(mul_fn, x)
I used this at one point to implement a softmax scan along a sequence.
I got an error when trying to create a simple binary classification for XOR case using Theano. It said dimension mismatch, but I can't find out what variable cause that.
and the strange part, my program is works when I change the number of neuron in the last layer. When I change to use 2 neuron in the last layer, and change that layer to softmax layer, and also use the negative log likelihood (multiclass classification style), this program is works fine.
This is my full code:
import numpy as np
import theano
import theano.tensor as T
class HiddenLayer(object):
def __init__(self, input, nIn, nOut, is_last, W=None):
self.input = input
W_val = np.random.randn(nIn,nOut)*0.001
b_val = np.zeros((nOut,))
self.W = theano.shared(np.asarray(W_val,dtype=theano.config.floatX),
name='W',borrow=True)
self.b = theano.shared(np.asarray(b_val,dtype=theano.config.floatX),
name='b',borrow=True)
self.z = T.dot(input,self.W) + self.b
if(is_last==0):
self.output = T.switch(self.z < 0 , 0 ,self.z)
else:
self.output = T.nnet.sigmoid(self.z)
self.y_pred = self.output > 0.5
self.params = [self.W, self.b]
def cost_function(self,y):
return -T.mean(y*T.log(self.output)+(1-y)*T.log(1-self.output))
def errors(self,y):
return T.mean(T.neq(self.y_pred,y))
alfa = 1
epoch = 1000
neu = 5
inpx = np.array([[1,0],[1,1],[0,0],[0,1]])
inpy = np.array([1,0,0,1])
x = T.fmatrix('x')
y = T.ivector('y')
layer0 = HiddenLayer(
input = x,
nIn = 2,
nOut = neu,
is_last=0
)
layer1 = HiddenLayer(
input = layer0.output,
nIn = neu,
nOut = 1,
is_last=1
)
params = layer0.params + layer1.params
cost = layer1.cost_function(y)
grads = T.grad(cost, params)
updates = [(param_i, param_i - alfa * grad_i) for param_i, grad_i in zip(params, grads)]
eror = layer1.errors(y)
train_model = theano.function([x,y], [eror,cost],updates=updates,allow_input_downcast=True)
test_model = theano.function([x,y],[eror,layer1.y_pred],allow_input_downcast=True)
for i in xrange(epoch):
etr,ctr = train_model(inpx, inpy)
if i%(epoch/10)==0:
print etr,ctr
et,pt = test_model(inpx,inpy)
print pt
and the error:
ValueError: Input dimension mis-match. (input[0].shape[1] = 1, input[1].shape[1] = 4)
Apply node that caused the error: Elemwise{neq,no_inplace}(sigmoid.0, DimShuffle{x,0}.0)
Toposort index: 41
Inputs types: [TensorType(float32, matrix), TensorType(int32, row)]
Inputs shapes: [(4L, 1L), (1L, 4L)]
Inputs strides: [(4L, 4L), (16L, 4L)]
Inputs values: [array([[ 0.94264328],
[ 0.99725735],
[ 0.5 ],
[ 0.95675617]], dtype=float32), array([[1, 0, 0, 1]])]
Outputs clients: [[Shape(Elemwise{neq,no_inplace}.0), Sum{acc_dtype=int64}(Elemwise{neq,no_inplace}.0)]]
Thank you in advance for any help.
Your problem is with your y and inpy variables: what you are trying to do is to have y be the expected output of the network. Your network is given a dataset with 4 elements, each having 2 features, you thus have 4 rows in your input matrix, and 2 columns. You are thus expected to have 4 elements in your predicted output, that is 4 rows in your y or inpy matrix, but you are using a vector, which in theano is a row vector and thus has only one row. You need either to transpose your y vector when computing the cost, or to define your y variable as a matrix, and thus to have inpy as a (4,1) matrix instead of a (4,) vector (once again, vectors are row vectors in theano).
Hope this helps,
Best