Which data format does keras model.fit function need?

Which data format does keras model.fit function need? - python

I'm would like to know which data format the model.fit function of keras needs. The documentation is not specific enough for me.
So it seems, that for an LSTM model it needs a 3D array for the parameter x.
Some more specific questions:
Does the data format depend on the chosen model?
What is the meaning of each dimension of x?
And what is the meaning of y?
Thanks in advance for anybody who can tell me a bit about that!
Holger

The data format certainly depends on the model. You can have models that have multiple inputs, such as Siamese networks.
In the case of an LSTM, I believe the input is 2-D as in this example. That example loads data from the IMDB dataset. The relevant line of code there is:
xs = [[oov_char if (w >= num_words or w < skip_top) else w for w in x] for x in xs]
The first dimension corresponds to different examples, and the second dimension is the timestep.
As for y, that refers to the labels. In a sequence to sequence example, this would also be two dimensional with the same [example_index, timestep] indexing. However, in classification it be 1-dimensional with one label for each example.

Related

TimeSeries K-means clustering for multi-dimensional data

I'm using Tslearn's TimeSeriesKmeans library to cluster my dataset with shape (3000,300,8), However the documentation only talks about cases where the dimension of the dataset being (n_samples,timesteps,1)i.e (single feature). Can anybody help me understand if I can perform clustering with a higher dimension?
I'm using "DTW" as my distance metric.

I used TimeSeriesKMeans from tslearn.clustering library. As you mentioned, the only example available on tslearn documentation is using 1 dimension input. However, it is very common to work with time series data with higher dimensions. For instance, in my case, I was clustering human motion which was 30 frames of 135 joint key points for each frame. Therefore, my data shape was like (number_of_samples, number_of_frames, features).
In order to use tslearns's Timeserieskmeans, you need to input an ndarray with (n_sample, m_time_step(sequence_length), k_features(k_dimensions) ).
If you take a look at the documentations, fit function parameters is as follows:
fit(X, y=None)[source] Compute k-means clustering.
Parameters: X : array-like of shape=(n_ts, sz, d) Time series
dataset.
y Ignored
The point is, your input data should be an ndarray with shape of (n_sample, seq_length, n_features) otherwise, it won't work. For example, at the first, my data was like a list of (n_samples,) and each element in that list was like (seq_length, features). It wan't work until I converted it to an ndarray with (n_sample, seq_length, features).

Understanding what exactly neural network is predicting in documentation example (MNIST)

I've taken a quick course in neural networks to better understand them and now I'm trying them out for myself in R. I'm following this documentation of Keras.
The way I understand what is happening:
We are inputting a series of images and transforming these images to numerical matrices based on the arrangement of the pixels and colors in those pixels. We then build a neural network model to learn the pattern of these arrangements, depending on the classification (0 to 9). We then use the model to predict which class an image belongs to. I'll be honest and admit I'm not entirely sure what y_train and x_train is. I simply see it as one training and one validation set so I'm not sure what the difference between x and y is.
My question:
I've followed the steps to the T and the model runs fine and the predictions look like they do in the documentation. Ultimately, the prediction looks like this:
I take this to mean that observation 1 in x_test is predicted to be a category 7.
However, looking at x_test it looks like this:
There is a 0 in every column and row, also if I scroll further down. This is where I get confused. I'm also not sure how I view the original images to view for myself how well they are predicting them. I would eventually like to draw a number myself in paint or so and then see if the model can predict it, but for that I need to first understand what is going on. I feel I am close but I just need a little nudge!

I think if you read more about the input and output layer's dimensions, that would help.
In your example:
Input layer:
A single training example of image has two dimensions 28*28, which is then converted to a single vector of dimension 784. This acts as the input layer for the neural network.
So for m training examples, your input layer will have dimensions (m, 784). Analogically speaking (to traditional ML systems), you can imagine that each pixel of an image is converted into a feature (or x1, x2, ... x784), and your training set is a dataframe with m rows and 784 columns, which is then fed into neural network to compute y_hat = f(x1,x2,x3,...x784).
Output layer:
As an output for our neural network, we want it to predict which number it is from 0 to 9. So for a single training example, the output layer has dimension 10, representing each number from 0 to 9 and for n testing examples the output layer would be a matrix with dimension n*10.
Our y is a vector of length n which would be something like [1,7,8,2,.....] containing true value for each testing example. But to match the dimension of output layer, the y vector's dimension are converted using one-hot encoding. Imagine a length 10 vector, representing number 7 by putting 1 at 7th place and rest of the positions zeros something like [0,0,0,0,0,0,1,0,0,0].
So in your question, if you wish to see the original image, you should be able to see it before reshaping the training examples with something like image(mnist$test$x[1, , ]
Hope this helps!!

y_train are the labels and x_train is the training data, so images in this example. You need to use some kind of plotting library to plot x'es. In this example you probably are not expected to input your own drawings and if you want you would need to preprocess them in the same way as in MNIST and pass them to the model.

Value Error eps=0.100000 as I try to reduce data dimensionaity. What could be the reason for this?

I am trying to use scikit's GaussianRandomProjection with my dataset which has a shape of 1599 x 11 as follows:
transformer = random_projection.GaussianRandomProjection()
X_new = transformer.fit_transform(wine_data.values[:, :11])
As I do this, I get an error that says:
ValueError: eps=0.100000 and n_samples=1599 lead to a
target dimension of 6323 which is larger than the original
space with n_features=1
I do not understand the error. What exactly does it mean? How could I use GaussianRandomProjection to reduce data dimensionality?

Here's a direct quotation from the official Scikit-Learn Doc on GaussianRandomProjection in its parameter n_components:
Dimensionality of the target projection space.
n_components can be automatically adjusted according to the number of
samples in the dataset and the bound given by the
Johnson-Lindenstrauss lemma. In that case the quality of the embedding
is controlled by the eps parameter.
It should be noted that Johnson-Lindenstrauss lemma can yield very
conservative estimated of the required number of components as it
makes no assumption on the structure of the dataset.
It seems that in your case the estimator tends to yield a 6323-dimensional projected target after "reducing" the dimensionality. This is obviously unexpected, because you desired to reduce the dimension other than increasing it. I suggest that you first presume the dimension (i.e. 8) of your desired output and then test if the model works in an expected way.
transformer = GaussianRandomProjection(n_components=8) #Set your desired dimension of the output
X_new = transformer.fit_transform(wine_data.values[:, :11])
Good luck

Using sample_weight in Keras for sequence labelling

I am working on a sequential labeling problem with unbalanced classes and I would like to use sample_weight to resolve the unbalance issue. Basically if I train the model for about 10 epochs, I get great results. If I train for more epochs, val_loss keeps dropping, but I get worse results. I'm guessing the model just detects more of the dominant class to the detriment of the smaller classes.
The model has two inputs, for word embeddings and character embeddings, and the input is one of 7 possible classes from 0 to 6.
With the padding, the shape of my input layer for word embeddings is (3000, 150) and the input layer for word embeddings is (3000, 150, 15). I use a 0.3 split for testing and training data, which means X_train for word embeddings is (2000, 150) and (2000, 150, 15) for char embeddings. y contains the correct class for each word, encoded in a one-hot vector of dimension 7, so its shape is (3000, 150, 7). y is likewise split into a training and testing set. Each input is then fed into a Bidirectional LSTM.
The output is a matrix with one of the 7 categories assigned for each word of the 2000 training samples, so the size is (2000, 150, 7).
At first, I simply tried to define sample_weight as an np.array of length 7 containing the weights for each class:
count = [list(array).index(1) for arrays in y for array in arrays]
count = dict(Counter(count))
count[0] = 0
total = sum([count[key] for key in count])
count = {k: count[key] / total for key in count}
category_weights = np.zeros(7)
for f in count:
category_weights[f] = count[f]
But I get the following error ValueError: Found a sample_weight array with shape (7,) for an input with shape (2000, 150, 7). sample_weight cannot be broadcast.
Looking at the docs, it looks like I should instead be passing a 2D array with shape (samples, sequence_length). So I create a (3000, 150) array with a concatenation of the weights of every word of each sequence:
weights = []
for sample in y:
current_weight = []
for line in sample:
current_weight.append(frequency[list(line).index(1)])
weights.append(current_weight)
weights = np.array(weights)
and pass that to the fit function through the sample_weight parameter after having added the sample_weight_mode="temporal" option in compile().
I first got an error telling me the dimension was wrong, however after generating the weights for only the training sample, I end up with a (2000, 150) array that I can use to fit my model.
Is this a proper way to define sample_weights or am I doing it all wrong ? I can't say I've noticed any improvements from adding the weights, so I must have missed something.

I think you are confusing sample_weights and class_weights. Checking the docs a bit we can see the differences between them:
sample_weights is used to provide a weight for each training sample. That means that you should pass a 1D array with the same number of elements as your training samples (indicating the weight for each of those samples). In case you are using temporal data you may instead pass a 2D array, enabling you to give weight to each timestep of each sample.
class_weights is used to provide a weight or bias for each output class. This means you should pass a weight for each class that you are trying to classify. Furthermore, this parameter expects a dictionary to be passed to it (not an array, that is why you got that error). For example consider this situation:
class_weight = {0 : 1. , 1: 50.}
In this case (a binary classification problem) you are giving 50 times as much weight (or "relevance") to your samples of class 1 compared to class 0. This way you can compensate for imbalanced datasets. Here is another useful post explaining more about this and other options to consider when dealing with imbalanced datasets.
If I train for more epochs, val_loss keeps dropping, but I get worse results.
Probably you are over-fitting, and something that may be contributing to that is the imbalanced classes your dataset has, as you correctly suspected. Compensating the class weights should help mitigate this, however there may still be other factors that can cause over-fitting that escape the scope of this question/answer (so make sure to watch out for those after solving this question).
Judging by your post, seems to me that what you need is to use class_weight to balance your dataset for training, for which you will need to pass a dictionary indicating the weight ratios between your 7 classes. Consider using sample_weight only if you want to give each sample a custom weight for consideration.
If you want a more detailed comparison between those two consider checking this answer I posted on a related question. Spoiler: sample_weight overrides class_weight, so you have to use one or the other, but not both, so be careful with not mixing them.
Update: As of the moment of this edit (March 27, 2020), looking at the source code of training_utils.standardize_weights() we can see that it now supports both class_weights and sample_weights:
Everything gets normalized to a single sample-wise (or timestep-wise)
weight array. If both sample_weights and class_weights are provided,
the weights are multiplied together.

I searched online for the same question and I did have good accuracy improvement after using sample_weight correctly in my case.
I think your understanding is correct and the procedure is also correct. One possible reason that you don't have improvements in your case is that, when you pass in the sample_weight, higher value means higher weight. This means that you cannot use word count directly. You might consider to use the inverted count frequency:
total = sum([count[key] for key in count])
count = {k: count[key] / total for key in count}
for f in count:
category_weights = np.zeros(7)
category_weights[f] = 1 - count[f]

TensorFlow Multi-Layer Perceptron

I am learning TensorFlow, and my goal is to implement MultiPerceptron for my needs. I checked the MNIST tutorial with MultiPerceptron implementation and everything was clear to me except this:
_, c = sess.run([optimizer, cost], feed_dict={x: batch_x,
y: batch_y})
I guess, x is an image itself(28*28 pixels, so the input is 784 neurons) and y is a label which is an 1x10 array:
x = tf.placeholder("float", [None, n_input])
y = tf.placeholder("float", [None, n_classes])
They feed whole batches (which are packs of data points and labels)! How does tensorflow interpret this "batch" input? And how does it update the weights: simultaneously after each element in a batch, or after running through the whole batch?
And, if I need to input one number (input_shape = [1,1]) and output four numbers (output_shape = [1,4]), how should I change the tf.placeholders and in which form should I feed them into session?
When I ask, how does tensorflow interpret it, I want to know how tensorflow splits the batch into single elements. For example, batch is a 2-D array, right? In which direction does it split an array? Or it uses matrix operations and doesn't split anything?
When I ask, how should I feed my data, I want to know, should it be a 2-D array with samples at its rows and features at its columns, or, maybe, could it be a 2-D list.
When I feed my float numpy array X_train to x, which is :
x = tf.placeholder("float", [1, n_input])
I receive an error:
ValueError: Cannot feed value of shape (1, 18) for Tensor 'Placeholder_10:0', which has shape '(1, 1)'
It appears that I have to create my data as a Tensor too?
When I tried [18x1]:
Cannot feed value of shape (18, 1) for Tensor 'Placeholder_12:0', which has shape '(1, 1)'

They feed whole bathces(which are packs of data points and labels)!
Yes, this is how neural networks are usually trained (due to some nice mathematical properties of having best of two worlds - better gradient approximation than in SGD on one hand and much faster convergence than full GD).
How does tensorflow interpret this "batch" input?
It "interprets" it according to operations in your graph. You probably have reduce mean somewhere in your graph, which calculates average over your batch, thus causing this to be the "interpretation".
And how does it update the weights: 1.simultaniusly after each element in a batch? 2. After running threw the whole batch?.
As in the previous answer - there is nothing "magical" about batch, it is just another dimension, and each internal operation of neural net is well defined for the batch of data, thus there is still a single update in the end. Since you use reduce mean operation (or maybe reduce sum?) you are updating according to mean of the "small" gradients (or sum if there is reduce sum instead). Again - you could control it (up to the agglomerative behaviour, you cannot force it to do per-sample update unless you introduce while loop into the graph).
And, if i need to imput one number(input_shape = [1,1]) and ouput four nubmers (output_shape = [1,4]), how should i change the tf.placeholders and in which form should i feed them into session? THANKS!!
just set the variables, n_input=1 and n_classes=4, and you push your data as before, as [batch, n_input] and [batch, n_classes] arrays (in your case batch=1, if by "1x1" you mean "one sample of dimension 1", since your edit start to suggest that you actually do have a batch, and by 1x1 you meant a 1d input).
EDIT: 1.when i ask, how does tensorflow interpret it, i want to know, how tensorflow split the batch into single elements. For example, batch is a 2-D array, right? In which direction it splits an array. Or it uses matrix operations and doesnt split anything? 2. When i ask, how should i feed my data, i want to know, should it be a 2-D array with samples at its rows and features at its colums, or, maybe, could it be a 2-D list.
It does not split anything. It is just a matrix, and each operation is perfectly well defined for matrices as well. Usually you put examples in rows, thus in first dimension, and this is exactly what [batch, n_inputs] says - that you have batch rows each with n_inputs columns. But again - there is nothing special about it, and you could also create a graph which accepts column-wise batches if you would really need to.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.