I have a N × F tensor with features and a N × 1 tensor with group index. I want to design a custom pytorch layer which will apply LSTM on each group with sorted features. I have mentioned LSTM with sorted group features as an example, hypothetically it can be anything which supports variable length input or sequence. Please refer to the image below for visual interpretation of the problem.
The obvious approach would be calling a LSTM layer for each unique group but that would be inefficient. Is there any better way to do it?
You can certainly parallelize the LSTM application -- the problem is indexing the feature tensor efficiently.
The best thing I could come up with (I use something similar for my own stuff) would be to list comprehend over the unique group ids to make a list of variable-length tensors, then pad them over and run the LSTM on top.
In code:
import torch
from torch import Tensor
from torch.nn.utils.rnn import pad_sequence
n = 13
f = 77
n_groups = 3
xs = torch.rand(n, f)
ids = torch.randint(low=0, high=n_groups, size=(n,))
def groupbyid(xs: Tensor, ids: Tensor, batch_first: bool,
padding_value: int = 0) -> Tensor:
return pad_sequence([xs[ids==idx] for idx in ids.unique()],
batch_first=batch_first,
padding_value=padding_value)
grouped = groupbyid(xs, ids)
print(grouped.shape)
# torch.Size([3, 5, 77])
You can then apply your LSTM in parallel over the n_groups dimension on the grouped Tensor.
Note that you will also need to inspect the content of ids.unique() to assign each LSTM output to its corresponding group id, but this is easy to write and depends on your application.
Related
I'm starting with a 2d tensor, composed of n seeds, tokenized. I then generate prediction logits and sort them using argsort(), and then take the top m candidates. I'm looking for the best way to append each of these predictions onto the seed which generated it, and create a new 2d tensor that now has n * m seeds, where each seed is now one token longer. I'm new to torch, and wondering if it has a built in vectorized way of doing this
Here's the pseudo-code for what I want it to do
def predict(seeds, coherence_threshold, batch_size=16):
"""Takes in a tensor of tokenized seeds, outputs a tensor of tokenized seeds with completions"""
dataloader = torch.utils.data.DataLoader(seeds, batch_size=batch_size, shuffle=False)
new_seeds = torch.tensor([], dtype=int)
with torch.no_grad():
for batch in dataloader:
batch_tensors = reference_gpt2(batch)
batch_preds = batch_tensors.argsort(descending=True)
batch_preds_pruned = batch_preds[:,-1,:coherence_threshold]
# TODO come up with a more efficient way to do this
for i in range(len(batch)):
for j in range(len(batch_preds[i]))
new_seed = torch.concat((batch[i], batch_preds[i,j]))
new_seeds = torch.concat((new_seeds, [new_seed]))
return(new_seeds)
To append PyTorch completions onto their original seeds in the fastest way possible, you can use the torch.cat() function. This function concatenates tensors along a given dimension, and can be used to efficiently append one tensor to another.
Ex:
import torch
# Assume that original_seed and completion are both tensors of shape (batch_size, sequence_length)
concatenated = torch.cat((original_seed, completion), dim=1)
I am working with REINFORCE algorithm with PyTorch. I noticed that the batch inference/predictions of my simple network with Softmax doesn’t sum to 1 (not even close to 1). I am attaching a minimum working code so that you can reproduce it. What am I missing here?
import numpy as np
import torch
obs_size = 9
HIDDEN_SIZE = 9
n_actions = 2
np.random.seed(0)
model = torch.nn.Sequential(
torch.nn.Linear(obs_size, HIDDEN_SIZE),
torch.nn.ReLU(),
torch.nn.Linear(HIDDEN_SIZE, n_actions),
torch.nn.Softmax(dim=0)
)
state_transitions = np.random.rand(3, obs_size)
state_batch = torch.Tensor(state_transitions)
pred_batch = model(state_batch) # WRONG PREDICTIONS!
print('wrong predictions:\n', *pred_batch.detach().numpy())
# [0.34072137 0.34721774] [0.30972624 0.30191955] [0.3495524 0.3508627]
# DOES NOT SUM TO 1 !!!
pred_batch = [model(s).detach().numpy() for s in state_batch] # CORRECT PREDICTIONS
print('correct predictions:\n', *pred_batch)
# [0.5955179 0.40448207] [0.6574412 0.34255883] [0.624833 0.37516695]
# DOES SUM TO 1 AS EXPECTED
Although PyTorch lets us get away with it, we don’t actually provide an input with the right dimensionality. We have a model that takes one input and produces one output, but PyTorch nn.Module and its subclasses are designed to do so on multiple samples at the same time. To accommodate multiple samples, modules expect the zeroth dimension of the input to be the number of samples in the batch.
Deep Learning with PyTorch
That your model works on each individual sample is an implementation nicety. You have incorrectly specified the dimension for the softmax (across batches instead of across the variables), and hence when given a batch dimension it is computing the softmax across samples instead of within samples:
nn.Softmax requires us to specify the dimension along which the softmax function is applied:
softmax = nn.Softmax(dim=1)
In this case, we have two input vectors in two rows (just like when we work with
batches), so we initialize nn.Softmax to operate along dimension 1.
Change torch.nn.Softmax(dim=0) to torch.nn.Softmax(dim=1) to get appropriate results.
I'm trying to train N-tuple Network using keras. N-tuple network is just sparse array of one-hot activated patterns. Imagine chess board with 64 squares, each square containing possible N types of pieces, so there will be always of 64 activated ones, for 64*N possible parameters, and stored as 2d array [64][N]. Or every possible 2x2 squares, so N^4 possible configuration for each such square. Such network is linear and will output 1 value. The training is a good old SGD and the likes.
I successfully trained the network using my code in c++, using lookup tables and summing. But I tried to do it keras, as keras allows for different optimization algorithms, use of GPUs etc. For starters I changed the 2d array into big vector, but soon it became impractical. There are thousands possible parameters, in which there are only handful (fixed) number of ones and the rest are zeros.
I was wondering if in keras (or similar library) it is possible to use training data like this: 13,16,11,11,5,...,3, where those numbers would be indexes, instead of using one big vector of 0,0,0,1,0,0,......,1,0,0,0,....,1,0,0,0,...
You could use, tf.sparse.SparseTensor(...), then set sparse=True, for tf.keras.Input(...).
def sparse_one_hot(y):
m = len(y)
n_classes = len(tf.unique(tf.squeeze(y))[0])
dim2 = tf.range(m, dtype='int64')[:, None]
indices = tf.concat([y, dim2], axis=1)
ones = tf.ones(shape=(m, ), dtype='float32')
sparse_y = tf.sparse.SparseTensor(indices, ones, dense_shape=(m, n_classes))
return sparse_y
import tensorflow as tf
y = tf.random.uniform(shape=(10, 1), minval=0, maxval=4, dtype=tf.int64)
sparse_y = sparse_one_hot(y) # sparse_y.values, sparse_y.indices
# set sparse=True, for Input
# tf.keras.Input(..., sparse=True, ...)
I am confused about how to format my own pre-trained weights for Keras Embedding layer if I'm also setting mask_zero=True. Here's a concrete toy example.
Suppose I have a vocabulary of 4 words [1,2,3,4] and am using vector weights defined by:
weight[1]=[0.1,0.2]
weight[2]=[0.3,0.4]
weight[3]=[0.5,0.6]
weight[4]=[0.7,0.8]
I want to embed sentences of length up to 5 words, so I have to zero pad them before feeding them into the Embedding layer. I want to mask out the zeros so further layers don't use them.
Reading the Keras docs for Embedding, it says the 0 value can't be in my vocabulary.
mask_zero: Whether or not the input value 0 is a special "padding"
value that should be masked out. This is useful when using recurrent
layers which may take variable length input. If this is True then all
subsequent layers in the model need to support masking or an exception
will be raised. If mask_zero is set to True, as a consequence, index 0
cannot be used in the vocabulary (input_dim should equal size of
vocabulary + 1).
So what I'm confused about is how to construct the weight array for the Embedding layer, since "index 0 cannot be used in the vocabulary." If I build the weight array as
[[0.1,0.2],
[0.3,0.4],
[0.5,0.6],
[0.7,0.8]]
then normally, word 1 would point to index 1, which in this case holds the weights for word 2. Or is it that when you specify mask_zero=True, Keras internally makes it so that word 1 points to index 0? Alternatively, do you just prepend a vector of zeros in index zero, as follows?
[[0.0,0.0],
[0.1,0.2],
[0.3,0.4],
[0.5,0.6],
[0.7,0.8]]
This second option seems to me to put the zero into the vocabulary. In other words, I'm very confused. Can anyone shed light on this?
You're second approach is correct. You will want to construct your embedding layer in the following way
embedding = Embedding(
output_dim=embedding_size,
input_dim=vocabulary_size + 1,
input_length=input_length,
mask_zero=True,
weights=[np.vstack((np.zeros((1, embedding_size)),
embedding_matrix))],
name='embedding'
)(input_layer)
where embedding_matrix is the second matrix you provided.
You can see this by looking at the implementation of keras' embedding layer. Notably, how mask_zero is only used to literally mask the inputs
def compute_mask(self, inputs, mask=None):
if not self.mask_zero:
return None
output_mask = K.not_equal(inputs, 0)
return output_mask
thus the entire kernel is still multiplied by the input, meaning all indexes are shifted up by one.
I have a multi-class(4-class) classification model in keras which looks like 1
While training, the model expects the input shape to be (None,None,300). That is, If there are 'n' different input sequences, then the input shape should be (n,None,300). In my case, the size of each input sequence is different.
Say, the input sequences are of shapes (1000,300), (1500,300), (1200,300) and (2000,300). Now I need to put them together to (4,None,300). I tried using numpy array, but numpy array won't give shape of (4,None,300),instead it will be (4L,).
Now I want to know how to train my model? Is it possible to do with numpy arrays or any different data structures are available?
Since your sequences are of different duration, you may consider padding them with zeros (adjusting the loss/labels accordingly) and then
max_duration = 2000
in_ = np.zeros((4, max_duration, 300), dtype='f4')
for i in xrange(4):
# fit sequence
in_[i,:len(seq[i]),:] = seq[i]