I have a batch of N sequences of integers of length L which is embedded into a N*L*d tensor. This sequence is auto-encoded by my network architecture. So, I have:
from theano import tensor as T
X = T.imatrix('X') # N*L elements in [0,C]
EMB = T.tensor('Embedding') # N*L*d
... # some code goes here :-)
PY = T.tensor('PY') # N*L*C probability of the predicted class in [0,C]
cost = -T.log(PY[X])
as far as I could get, the indexing is in the first dimension of the tensor, so I had to use a theano.scan. Is there a way to index the tensor directly?
Sounds like you want a 3 dimensional version of theano.tensor.nnet.categorical_crossentropy?
If so, then I think you could simply flatten the matrix of true class label indexes into a vector and the 3D tensor of predicted class probabilities into a matrix and then use the built in function.
cost = T.nnet.categorical_crossentropy(
Y.reshape((Y.shape[0] * Y.shape[1], X.shape[2])),
X.flatten())
The order of entries in Y may need to be adjusted first (e.g. via a dimshuffle) to make sure the entries in the matrix and vector being compared correspond to each other.
Here we assume, as the question suggests, that the sequences are not padded -- they are all exactly L elements in length. If the sequences are actually padded then you may need to do something much more complicated to avoid computing cost elements inside the padding regions.
Related
I have embeddings of knowledge graphs by RDF2vec with dimensions KG.shape=(7536, 500) and embeddings of textual sentences by sentence bert with dimensions text.shape=(14169, 384). I want to concatenate these embeddings of different dimensions.Although I tried making them equal dimension in columns by z_text = numpy.zeros((14169,116), dtype=numpy.float32) and append in text embeddings by final_text=numpy.append(text, z_text, axis=1) to make its dimensions (14169, 500) so as to match another KG dimensions (by column=500).
Is this approch okay or do we have better alternative? Does adding numpy.zeros will affect the performance of model while training? Could anyone help?
The technical way to do this is as such. You start with your two arrays:
import numpy as np
a = np.ones((7536, 500))
b = np.ones((14169,116))
Then you create your zeros:
z = np.zeros((14169,384))
Then you append the zeros along the second dimension or axis=1 (there can be only one dimension where the shapes do not align):
bz = np.concatenate([b, z], axis=1)
Finally, you can combine one of your original arrays with the second padded array, this time along the first dimension (axis=0):
c = np.concatenate([a, bz])
However, to reiterate my comment:
Why would you want to concatenate embeddings of different sizes? It
has no real meaning mathematically as you wouldn't be able to perform any
truly useful transformations down the line. Of course, you could pad
the shortest sequences to 500 just to make this happen but it would be
a pretty worthless matrix for all intents and purposes.
I have a tensor of shape "torch.Size([2, 2, 3])" and another tensor of shape "torch.Size([2, 1, 3])". I want a concatenated tensor of shape "torch.Size([2, 2, 6])".
For example :
a=torch.tensor([[[2,3,5],[12,13,15]],[[20,30,50],[120,130,150]]])
b=torch.tensor([[[99,99,99]],[[999,999,999]]])
I want the output as : [[[99,99,99,2,3,5],[99,99,99,12,13,15]],[[999,999,999,20,30,50],[999,999,999,120,130,150]]]
I have written a O(n2) solution using two for loops but,
This is taking a lot of time with millions of calculation, Does anyone help me in doing this efficiently ?? May be some matrix calculation trick for tensors ??
To exactly match the example you have provided:
c = torch.cat([b.repeat([1,a.shape[1]//b.shape[1],1]),a],2)
The reasoning behind this is that the concatenate operation in pytorch (and numpy and other libraries) will complain if the dimensions of the two tensors in the non-specified axes (in this case 0 and 1) do not match. Therefore, you have to repeat the tensor along the non-matching axis (the first axis, therefore the second element of the repeat list) in order to make the dimensions align. Note that the solution here will only work if the middle dimension of a is evenly divisible by the middle dimension of b.
In newer versions of pytorch, this can also be done using the torch.tile() function.
I'm trying to do a grid search over a model I've trained. So, producing a mesh, then predicting that mesh with the model to find a maximum.
I'm producing the mesh with:
def generate_random_grid(n_scanning_parameters,n_points_each_dimension):
points=np.linspace(1,0,num=n_points_each_dimension,endpoint=False)
x_points=[points for dimension in range(n_scanning_parameters)]
mesh=np.array(np.meshgrid(*x_points))
return mesh
As you can see, I don't know the dimensions in advance. So later when I want to index the mesh to predict different points, I don't know how to index.
E.g, if I have 4 dimensions and 10 points along each dimension, the mesh has the shape (4,10,10,10,10). And I need to access points like e.g. [:,0,0,0,0] or [:,1,2,3,4]. Which would give me a 1-D vector with 4 elements.
Now I can produce the 4 last indices using
for index in np.ndindex(*mesh.shape[1:]):
, but then indexing my mesh like mesh[:,index] doesn't result in a 1-D vector with 4 elements as I expect it to.
How can I index the mesh?
Since you're working with tuples, and numpy supports tuple indexing, let's start with that.
Effectively, you want to do your slicing like a[:, 0, 0, 0, 0]. But your index is a tuple, and you're attempting something like a[:, (0,0,0,0)] - this gives you four hyperplanes along the second dimension instead. Your indexing should be more like a[(:,0,0,0,0)] - but this gives a syntax error.
So the solution would be to use the slice built-in.
a[(slice(None),0,0,0,0)]
This would give you your one dimensional vector.
In terms of your code, you can simply add the tuples to make this work.
for index in np.ndindex(*mesh.shape[1:]):
vector = mesh[(slice(None), ) + index]
An alternative approach would be to simply use a transposed array and reversed indices. The first dimension is at the end, removing the need for :.
for index in np.ndindex(*mesh.shape[1:]):
vector = mesh.T[index[::-1]]
I'm trying to write a function that performs Convolution, and I'm getting a little challenged trying to create the output volume using numpy. Specifically, I have an input image that is represented as an array of dimensions (150,150,3). Now, I want to convolve over this image with a set of kernels num_kernels, which are arrays of dimension (4,4,3), and I want these kernels to move over the image with a stride of 2. My thought process has been:
(1) I'll create an output array which is comprised of taking (4,4,3) size chunks out of the input array and stretching these out into rows, and ultimately making a large matrix of these.
(2) Then, I'll create a parameter array composed of all of my (4,4,3) kernels stretched out into rows, which will also make a large matrix.
(3) Then I can dot product these matrices together and reshape the output matrix into the proper dimensions.
My rough psuedo-code start to number (1) is as follows.
def Convolution(input, filter_size, num_filters, stride):
X = input
output_Volume = np.zeros(#dimensions)
weights = np.zeros(#dimensions)
#get weights from other function
for width in range(0,150,2):
for height in range(0,150,2):
row = X(#indexes here to take out chunk).flatten
output_Volume.append(row) #something of this sort
return #dot product output volume and weights
If someone could provide a specific code example of how to implement this (most helpful would be answers to (1) and (2)) in Python (I'm using numpy), it would be much appreciated. Thank you!
The tutorial on MNIST for ML Beginners, in Implementing the Regression, shows how to make the regression on a single line, followed by an explanation that mentions the use of a trick (emphasis mine):
y = tf.nn.softmax(tf.matmul(x, W) + b)
First, we multiply x by W with the expression tf.matmul(x, W). This is flipped from when we multiplied them in our equation, where we had Wx, as a small trick to deal with x being a 2D tensor with multiple inputs.
What is the trick here, and why are we using it?
Well, there's no trick here. That line basically points to one previous equation multiplication order
# Here the order of W and x, this equation for single example
y = Wx +b
# if you want to use batch of examples you need the change the order of multiplication; instead of using another transpose op
y = xW +b
# hence
y = tf.matmul(x, W)
Ok, I think the main point is that if you train in batches (i.e. train with several instances of the training set at once), TensorFlow always assumes that the zeroth dimension of x indicates the number of events per batch.
Suppose you want to map a training instance of dimension M to a target instance of dimension N. You would typically do this by multiplying x (a column vector) with a NxM matrix (and, optionally, add a bias with dimension N (also a column vector)), i.e.
y = W*x + b, where y is also a column vector.
This is perfectly alright seen from the perspective of linear algebra. But now comes the point with the training in batches, i.e. training with several training instances at once.
To get to understand this, it might be helpful to not view x (and y) as vectors of dimension M (and N), but as matrices with the dimensions Mx1 (and Nx1 for y).
Since TensorFlow assumes that the different training instances constituting a batch are aligned along the zeroth dimension, we get into trouble here since the zeroth dimension is occupied by the different elements of one single instance.
The trick is then to transpose the above equation (remember that transposition of a product also switches the order of the two transposed objects):
y^T = x^T * W^T + b^T
This is pretty much what has been described in short within the tutorial.
Note that y^T is now a matrix of dimension 1xN (practically a row vector), while x^T is a matrix of dimension 1xM (also a row vector). W^T is a matrix of dimension MxN. In the tutorial, they did not write x^T or y^T, but simply defined the placeholders according to this transposed equation. The only point that is not clear to me is why they did not define b the "transposed way". I assume that the + operator automatically transposes b if it is necessary in order to get the correct dimensions.
The rest is now pretty easy: if you have batches larger than 1 instance, you just "stack" multiple of the x (1xM) matrices, say to a matrix of dimensions (AxM) (where A is the batch size). b will hopefully automatically broadcasted to this number of events (that means to a matrix of dimension (AxN). If you then use
y^T = x^T * W^T + b^T,
you will get a (AxN) matrix of the targets for each element of the batch.