Keras LSTM Batch Size and model.fit() - python

I am training an LSTM in Keras. As per documentation, my training data and labels have shape (20, 20, 1) representing 20 samples with 20 time steps and one feature. When I use model.fit() to train my model, do I need to specify batch size or will all 20 samples be sent as one batch by default?

According to Keras's fit documentation
batch_size Integer or NULL. Number of samples per gradient update. If unspecified, batch_size will default to 32.

Related

Tensorflow : Manually selecting the batch when training deep neural networks

# x_train.shape[0] = 54000
model.fit(
x_train, y_train,
batch_size = 128,
epochs = 12,
validation_data = (x_val, y_val)
)
When I am using this fit() method to train a neural network:
batch_size = 128 means that I randomly pick 54000 // 128 batches of size 128 in my training dataset every epoch.
Are those batches chosen with replacement? I suspect from the docs they're not but I'd like confirmation.
Can I manually choose my batches? I would like to focus on specific images and not others for a given batch, by choosing them personally instead of letting randomness choose for me.
Are those batches chosen with replacement?
In each individual epoch, no. Of course the entire dataset is used again in the next epoch.
Can I manually choose my batches? I would like to focus on specific images and not others for a given batch, by choosing them personally instead of letting randomness choose for me.
You should create a custom dataset for this, and leave the rest of the training loop (data loader, model etc.) unchanged.
But be aware that the samples in a minibatch are supposed to be random.

matching PyTorch tensor dimensions

I am having some issues with regards to the dimensionality of my tensors in my training function at present. I am using the MNIST dataset, so 10 possible targets, and originally wrote the prototype code using a training batch size of 10, which was in retrospect not the wisest choice. It gave poor results during some earlier tests, and increasing the amount of training iterations saw no benefit. Upon trying to then increase the batch size, I realised that what I had written was not that general, and I was likely never training it on the proper data. Below is my training function:
def Train(tLoops, Lrate):
for _ in range(tLoops):
tempData = train_data.view(batch_size_train, 1, 1, -1)
output = net(tempData)
trainTarget = train_targets
criterion = nn.MSELoss()
print("target:", trainTarget.size())
print("Output:", output.size())
loss = criterion(output, trainTarget.float())
# print("error is:", loss)
net.zero_grad() # zeroes the gradient buffers of all parameters
loss.backward()
for j in net.parameters():
j.data.sub_(j.grad.data * Lrate)
of which the print functions return
target: torch.Size([100])
Output: torch.Size([100, 1, 1, 10])
before the error message on the line where loss is calculated;
RuntimeError: The size of tensor a (10) must match the size of tensor b (100) at non-singleton dimension 3
The first print, target, is a 1-dimensional list of the respective ground truth values for each image. Output contains the output of the neural net for each of those 100 samples, so a 10 x 100 list, however from skimming and reshaping the data from 28 x 28 to 1 x 784 earlier, I seem to have extra dimensions unnecessarily. Does PyTorch provide a way to remove these? I couldn't find anything in the documentation, or is there something else that could be my issue?
There are several problems in your training script. I will address each of them below.
First, you should NOT do data batching by hand. Pytorch/torchvision have functions for that, use a dataset and a data loader: https://pytorch.org/tutorials/recipes/recipes/loading_data_recipe.html.
You should also NEVER update the parameters of you network by hand. Use an Optimizer: https://pytorch.org/docs/stable/optim.html. In your case, SGD without momentum will have the same effect.
The dimensionality of your input seems to be wrong, for MNIST an input tensor should be (batch_size, 1, 28, 28) or (batch_size, 784) if you're training a MLP. Furthermore, the output of your network should be (batch_size, 10)

All training samples are not loading during training

I'm just starting with NLP. I loaded the 'imdb_reviews' dataset from tensorflow_datasets.
There were 25000 testing samples, but when I run I only train for 782 samples. I didn't use batch_size, just loaded entire dataset at once as you can see
The other hyperparameters are:
vocab_size = 10000
input_length = 120
embedding_dims = 16
Can anyone tell me what I'm doing wrong ?
By default the fit method of tf.keras.model will set the batch size to be 32.
https://www.tensorflow.org/api_docs/python/tf/keras/Model
As 32*782 = 25,024 it probably just drops the last batch.

batch size for LSTM

I've been trying to set up an LSTM model but I'm a bit confused about batch_size. I'm using the Keras module in Tensorflow.
I have 50,000 samples, each has 200 time steps and each time step has three features. So I've shaped my training data as (50000, 200, 3).
I set up my model with four LSTM layers, each having 100 units. For the first layer I specified the input shape as (200, 3). The first three layers have return_sequences=True, the last one doesn't. Then I do some softmax classification.
When I call model.fit with batch_size='some_number' do Tensorflow/Keras take care of feeding the model with batches of the specified size? Do I have to reshape my data somehow in advance? What happens if the number of samples is not evenly divisible by 'some_number'?
Thanks for your help!
If you provide your data as numpy arrays to model.fit() then yes, Keras will take care of feeding the model with the batch size you specified. If your dataset size is not divisible by the batch size, Keras will have the final batch be smaller and equal to dataset_size mod batch_size.

Using noise_shape of the Dropout layer. Batch_size does not fit into provided samples. What to do?

I am using a dropout layer in my model. As I use temporal data, I want the noise_shape to be the same per timestep -> (batch_size, 1, features).
The problem is if I use a batch size that does not fit into the provided samples, I get an error message. Example: batch_size= 2, samples= 7. In the last iteration, the batch_size (2) is larger than the rest of the samples (1)
The other layers (my case: Masking, Dense, and LSTM) apparently don`t have a problem with that and just use a smaller batch for the last, not fitting, samples.
ConcreteError:
Training data shape is:[23, 300, 34]
batchsize=3
InvalidArgumentError (see above for traceback): Incompatible shapes:
[2,300,34] vs. [3,1,34] [[Node: dropout_18/cond/dropout/mul =
Mul[T=DT_FLOAT,
_device="/job:localhost/replica:0/task:0/device:CPU:0"](dropout_18/cond/dropout/div,
dropout_18/cond/dropout/Floor)]]
Meaning that for the last batch [2,300,34], the batch_size cannot split up into [3,1,34]
As I am still in the parameter tuning phase (does that ever stop :-) ),
Lookback(using LSTMs),
split-percentage of train/val/test,
and batchsize
will still constantly change. All of the mentioned influence the actual length and shape of the Training data.
I could try to always find the next fitting int for batch_size by some calculations. Example, if batch_size=4 and samples=21, I could reduce batch_size to 3. But if the number of training samples are e.g. primes this again would not work. Also If I choose 4, I probably would like to have 4.
Do I think to complex? Is there a simple solution without a lot of exception programming?
Thank you
Thanks to nuric in this post, the answer is quite simple.
The current implementation does adjust the according to the runtime
batch size. From the Dropout layer implementation code:
symbolic_shape = K.shape(inputs) noise_shape = [symbolic_shape[axis]
if shape is None else shape
for axis, shape in enumerate(self.noise_shape)]
So if you give noise_shape=(None, 1, features) the shape will be
(runtime_batchsize, 1, features) following the code above.

Categories

Resources