I am new to Deep Learning and I struggle with some data format on Keras. My CNN is based on the Stacked Hourglass Networks for Human Pose Estimation from A.Newell et al.
On this network the input is a 256x256 RGB image and the output should be a 64x64 heatmap highlighting body joints (shoulder, knee,...). I manage to build the network and I have all the data (images) with their annotations (pixel labels for body joints). I was wondering how should I format the Input and Output Data of the training set to train my model. Currently I use a numpy array (256,256,3) for an image and I don't know how to format my output. Should I create a table [n,64,64,7]? (n being the size of the training set and 7 is the number of filters I use to obtain a heatmap for 7 joints)
Thank you for your time.
The output can also be a numpy array.
Consider this example:
Training set: 50 images of size 256x256x3. This can be combined into a single numpy array of shape(50, 256, 256, 3).
Similar approach to format the output data.
Sample code below:
#a, b and c are arrays of size 256x256x3
import numpy as np
temp = []
temp.append(a)
temp.append(b)
temp.append(c)
output_labels = []
output_labels = np.stack(temp)
The output_labels array will be of shape(3x256x256x3).
Keras recommend to create data generator to feed training data and ground truth to network.
Specific to stacked hourglass network case, you can refer to my implementation for details https://github.com/yuanyuanli85/Stacked_Hourglass_Network_Keras/tree/master/src/data_gen
Related
Recently I have been learning Tensor Flow, and I have written a few machine learning programs, however, I am wondering in what way can I test the model on a single input and receive the prediction, and not just evaluate the accuracy of the model on a lot of data as you would do using the model.fit() function. I am also wondering how can I then implement the model in a script, that for example gathers data and feeds it into the model automatically to obtain the predictions and then for example plots the results on a graph.
Thanks in advance.
To use your trained model for a single input lets call it y, you must process y to have the same data format your model was trained on. For example lets assume that you trained on model on images of cats and dog. If you model trained properly you should be able to submit a picture of a cat or a dog to it and have it tell you which it is.
Now if images were the input used to train the model they had a certain image shape (height,width) and a certain channel format for example RGB or Grayscale etc. So for the image y you want to predict you must ensure its size is the same height and width the model was trained on. If the model was trained on rgb images then y must be an rgb image. one more thing. When using model.predict say for predicting the single image y you will have to account for the fact that model.predict requires that you have the first dimension of y to be the batch_size. For the case of a single image the batch size is 1. So you need to expand the dimensions of y to include the batch size. For an immage the shape of y is (height, width,channels). It doesn't have a batch dimension so you need to add it. You can do that with
the y=np.expand_dims(y,axis=0) which will now give y the shape (1, height,width,channels). For example lets assume you trained you model on images of shape (224,224,3) in rgb format. You have an image y you want to classify and say it is a directory my_pics. The code below shows how to handle doing a prediction on image y. Somewhere in your training code you need to have an ordered list called classes. For the dog example the index code for cat might be 0 and the index code for dog then will be 1. So classes would be classes=['cat', 'dog']
model=tf.keras.models.load_model(path where model is stored) # load the trained model
image_path=r'my_pics' # path to image y
y=cv2.imread(image_path) #Note cv2 reads in images as bgr
y=cv2.resize(y, (224,224) # gives y the same shape as the training images
y=cv2.cvtColor(y, cv2.COLOR_BGR2RGB)# convert from bgr to rgb
y=np.expand_dims(y, axis=0) # y has shape (1,224,224,3)
prediction = model.predict(y) # make a prediction on y
print (prediction) # is a list with a probability value for each class
class_index=np.argmax(prediction # gives index of entry in prediction with highest probability
klass=classes[class_index} #selects the class name from the ordered list of classes
print (class)
I've taken a quick course in neural networks to better understand them and now I'm trying them out for myself in R. I'm following this documentation of Keras.
The way I understand what is happening:
We are inputting a series of images and transforming these images to numerical matrices based on the arrangement of the pixels and colors in those pixels. We then build a neural network model to learn the pattern of these arrangements, depending on the classification (0 to 9). We then use the model to predict which class an image belongs to. I'll be honest and admit I'm not entirely sure what y_train and x_train is. I simply see it as one training and one validation set so I'm not sure what the difference between x and y is.
My question:
I've followed the steps to the T and the model runs fine and the predictions look like they do in the documentation. Ultimately, the prediction looks like this:
I take this to mean that observation 1 in x_test is predicted to be a category 7.
However, looking at x_test it looks like this:
There is a 0 in every column and row, also if I scroll further down. This is where I get confused. I'm also not sure how I view the original images to view for myself how well they are predicting them. I would eventually like to draw a number myself in paint or so and then see if the model can predict it, but for that I need to first understand what is going on. I feel I am close but I just need a little nudge!
I think if you read more about the input and output layer's dimensions, that would help.
In your example:
Input layer:
A single training example of image has two dimensions 28*28, which is then converted to a single vector of dimension 784. This acts as the input layer for the neural network.
So for m training examples, your input layer will have dimensions (m, 784). Analogically speaking (to traditional ML systems), you can imagine that each pixel of an image is converted into a feature (or x1, x2, ... x784), and your training set is a dataframe with m rows and 784 columns, which is then fed into neural network to compute y_hat = f(x1,x2,x3,...x784).
Output layer:
As an output for our neural network, we want it to predict which number it is from 0 to 9. So for a single training example, the output layer has dimension 10, representing each number from 0 to 9 and for n testing examples the output layer would be a matrix with dimension n*10.
Our y is a vector of length n which would be something like [1,7,8,2,.....] containing true value for each testing example. But to match the dimension of output layer, the y vector's dimension are converted using one-hot encoding. Imagine a length 10 vector, representing number 7 by putting 1 at 7th place and rest of the positions zeros something like [0,0,0,0,0,0,1,0,0,0].
So in your question, if you wish to see the original image, you should be able to see it before reshaping the training examples with something like image(mnist$test$x[1, , ]
Hope this helps!!
y_train are the labels and x_train is the training data, so images in this example. You need to use some kind of plotting library to plot x'es. In this example you probably are not expected to input your own drawings and if you want you would need to preprocess them in the same way as in MNIST and pass them to the model.
I have a numpy array representation of an image and I want to turn it into a tensor so I can feed it through my pytorch neural network.
I understand that the neural networks take in transformed tensors which are not arranged in [100,100,3] but [3,100,100] and the pixels are rescaled and the images must be in batches.
So I did the following:
import cv2
my_img = cv2.imread('testset/img0.png')
my_img.shape #reuturns [100,100,3] a 3 channel image with 100x100 resolution
my_img = np.transpose(my_img,(2,0,1))
my_img.shape #returns [3,100,100]
#convert the numpy array to tensor
my_img_tensor = torch.from_numpy(my_img)
#rescale to be [0,1] like the data it was trained on by default
my_img_tensor *= (1/255)
#turn the tensor into a batch of size 1
my_img_tensor = my_img_tensor.unsqueeze(0)
#send image to gpu
my_img_tensor.to(device)
#put forward through my neural network.
net(my_img_tensor)
However this returns the error:
RuntimeError: _thnn_conv2d_forward is not implemented for type torch.ByteTensor
The problem is that the input you give to your network is of type ByteTensor while only float operations are implemented for conv like operations. Try the following
my_img_tensor = my_img_tensor.type('torch.DoubleTensor')
# for converting to double tensor
Source PyTorch Discussion Forum
Thanks to AlbanD
I am trying to have a look at the MNIST data set for machine learning. In Tensorflow the MNIST data set can be imported with
mnist = input_data.read_data_sets("/tmp/data/", one_hot=True)
full_data_x = mnist.train.images
However when I try to visualize an 80x80 array of the data using
test_x, test_y = mnist.test.images, mnist.test.labels
plt.gray()
plt.imshow(1-test_x[80:160,80:160])
it looks really strange like this:
How can I extract an image of the actual hand-written digits, like they are shown in the internet:
I saw the similar questions like this. However I would especially be interested where in the training data array the images are actually hidden. I know that tensor flow module provides a function to display the images.
I think I understand your question now, and it is a bit different than the one I thought was duplicate.
The images are not necessarily hidden. Each index of that list is an image in itself:
num_test_images, num_train_images = len(mnist.test.images), len(mnist.train.images)
size_of_first_test_image, size_of_first_train_image = len(mnist.test.images[0]), len(mnist.train.images[0])
print num_test_images, num_train_images
print size_of_first_test_image, size_of_first_train_image
output:
10000 55000
784 784
You can see that the number of training and testing images is the length of each mnist list. Each image is a flat array of size 784. You will have to reshape it yourself to display it using numpy or something of the sort.
Try this:
first_test_image = np.array(mnist.test.images[0], dtype='float')
reshaped_image = first_image.reshape((28, 28))
I am trying to extract features from audio files using Librosa, to feed to a CNN as Numpy arrays.
Currently i save a single feature at a time to feed into the CNN. I save two dimensional (single-channel) log-scaled mel-spectrogram features in Python using Librosa:
def build_features():
y, sr = librosa.load("audio.wav")
mel = librosa.feature.melspectrogram(
n_fft=4096,
n_mels=128, #Mel-bins
hop_length=2048,
)
logamplitude = librosa.amplitude_to_db
logspec = logamplitude(mel, ref=1.0)[np.newaxis, :, :, np.newaxis]
This gives the shape (1,128,323,1).
I would like to add another feature, let's say a tempogram. I can do this, using the same code, but replacing melspectrogram to tempogram', and setting the window length to 128.
This gives me a tempogram shape of (1,128,323,1).
Now i would like to "stack" these 2 feature layers, into a multi-channel numpy object, that i can feed into a CNN in Keras.
How should i code this?
EDIT:
Think I figured it out, using np.vstack()