I am trying to have a look at the MNIST data set for machine learning. In Tensorflow the MNIST data set can be imported with
mnist = input_data.read_data_sets("/tmp/data/", one_hot=True)
full_data_x = mnist.train.images
However when I try to visualize an 80x80 array of the data using
test_x, test_y = mnist.test.images, mnist.test.labels
plt.gray()
plt.imshow(1-test_x[80:160,80:160])
it looks really strange like this:
How can I extract an image of the actual hand-written digits, like they are shown in the internet:
I saw the similar questions like this. However I would especially be interested where in the training data array the images are actually hidden. I know that tensor flow module provides a function to display the images.
I think I understand your question now, and it is a bit different than the one I thought was duplicate.
The images are not necessarily hidden. Each index of that list is an image in itself:
num_test_images, num_train_images = len(mnist.test.images), len(mnist.train.images)
size_of_first_test_image, size_of_first_train_image = len(mnist.test.images[0]), len(mnist.train.images[0])
print num_test_images, num_train_images
print size_of_first_test_image, size_of_first_train_image
output:
10000 55000
784 784
You can see that the number of training and testing images is the length of each mnist list. Each image is a flat array of size 784. You will have to reshape it yourself to display it using numpy or something of the sort.
Try this:
first_test_image = np.array(mnist.test.images[0], dtype='float')
reshaped_image = first_image.reshape((28, 28))
Related
I have a code that generates an iterator from a Tensorflow dataset. The code is this:
#tf.function
def normalize_image(record):
out = record.copy()
out['image'] = tf.cast(out['image'], 'float32') / 255.
return out
train_it = iter(tfds.builder('mnist').as_dataset(split='train').map(normalize_image).repeat().batch(256*10))
However, I want to do the manual splitting. For example, the MNISt dataset has 60000 training samples, but I want to only use the first 50000 (and hold others for validation). The problem is I don't know how to do so.
I tried to convert it to NumPy and split based on that, but then I couldn't apply the map to it.
ds_builder = tfds.builder('mnist')
print(dir(ds_builder))
ds_builder.download_and_prepare()
train_ds = tfds.as_numpy(ds_builder.as_dataset(split='train', batch_size=-1))
train_ds['image'] = train_ds['image'][0:50000, : , :]
train_ds['label'] = train_ds['label'][0:50000]
I was wondering how to do so.
P.S: The ordering of data is also important for me, so I was thinking of loading all data in Numpy and saving the required ones in png and loading with tfds, but I'm not sure if it keeps the original order or not. I want to take the first 50000 samples of the whole 60000 samples.
Thanks.
train_ds = tfds.builder('mnist').as_dataset(split='train').map(normalize_image)
train_ds = train_ds.take(50000).repeat().batch(256*10)
val_ds = tfds.builder('mnist').as_dataset(split='train').map(normalize_image)
val_ds = val_ds.skip(50000).batch(256*10)
train_it = iter(train_ds)
val_it = iter(val_ds)
I trying yolo model in python.
To process the data and annotation I'm taking the data in batches.
batchsize = 50
#boxList= []
#boxArr = np.empty(shape = (0,26,5))
for i in range(0, len(box_list), batchsize):
boxList = box_list[i:i+batchsize]
imagesList = image_list[i:i+batchsize]
#to convert the annotation from VOC format
convertedBox = np.array([np.array(get_boxes_for_id(box_l)) for box_l in boxList])
#pre-process on image and annotaion
image_data, boxes = process_input_data(imagesList,max_boxes,convertedBox)
boxes = np.array(list(itertools.chain.from_iterable(boxes)))
detectors_mask, matching_true_boxes = get_detector_mask(boxes, anchors)
after this, I want to pass my data to the model to train.
when I append the list it gives memory error because of array size.
and when i append array gives dimensionality error because of shape.
how can i train the data and what shoud i use model.fit() or model.train_on_batch()
If you are using Keras to Train your model with a bunch of Images you can use Train generator and validation generator, all you have to do is put your images in there respective class folders. look at a sample code . also take a look at this link maybe it may help you https://keras.io/preprocessing/image/ . i hope i have answered your question unless i did not understand it
I have been trying to find a way to load the EMNIST-letters dataset but without much success. I have found interesting stuff in the structure and can't wrap my head around what is happening. Here is what I mean:
I downloaded the .mat format in here
I can load the data using
import scipy.io
mat = scipy.io.loadmat('letter_data.mat') # renamed for conveniance
it is a dictionnary with the keys as follow:
dict_keys(['__header__', '__version__', '__globals__', 'dataset'])
the only key with interest is dataset, which I havent been able to gather data from. printing the shape of it give this:
>>>print(mat['dataset'].shape)
(1, 1)
I dug deeper and deeper to find a shape that looks somewhat like a real dataset and came across this:
>>>print(mat['dataset'][0][0][0][0][0][0].shape)
(124800, 784)
which is exactly what I wanted but I cant find the labels nor the test data, I tried many things but cant seem to understand the structure of this dataset.
If someone could tell me what is going on with this I would appreciate it
Because of the way the dataset is structured, the array of image arrays can be accessed with mat['dataset'][0][0][0][0][0][0] and the array of label arrays with mat['dataset'][0][0][0][0][0][1]. For instance, print(mat['dataset'][0][0][0][0][0][0][0]) will print out the pixel values of the first image, and print(mat['dataset'][0][0][0][0][0][1][0]) will print the first image's label.
For a less...convoluted dataset, I'd actually recommend using the CSV version of the EMNIST dataset on Kaggle: https://www.kaggle.com/crawford/emnist, where each row is a separate image, there are 785 columns where the first column = class_label and each column after represents one pixel value (784 total for a 28 x 28 image).
#Josh Payne's answer is correct, but I'll expand on it for those who want to use the .mat file with an emphasis on typical data splits.
The data itself has already been split up in to a training and test set. Here's how I accessed the data:
from scipy import io as sio
mat = sio.loadmat('emnist-letters.mat')
data = mat['dataset']
X_train = data['train'][0,0]['images'][0,0]
y_train = data['train'][0,0]['labels'][0,0]
X_test = data['test'][0,0]['images'][0,0]
y_test = data['test'][0,0]['labels'][0,0]
There is an additional field 'writers' (e.g. data['train'][0,0]['writers'][0,0]) that distinguishes the original sample writer. Finally, there is another field data['mapping'], but I'm not sure what it is mapping the digits to.
In addition, in Secion II D, the EMNIST paper states that "the last portion of the training set, equal in size to the testing set, is set aside as a validation set". Strangely, the .mat file training/testing size does not match the number listed in Table II, but it does match the size in Fig. 2.
val_start = X_train.shape[0] - X_test.shape[0]
X_val = X_train[val_start:X_train.shape[0],:]
y_val = y_train[val_start:X_train.shape[0]]
X_train = X_train[0:val_start,:]
y_train = y_train[0:val_start]
If you don't want a validation set it is fine to leave these samples in the training set.
Also, if you would like to reshape the data into 2D, 28x28 sized images instead of a 1D 784 array, to get the correct image orientation you'll need to do a numpy reshape using Fortran ordering (Matlab uses column-major ordering, just like Fortran. reference). e.g. -
X_train = X_train.reshape( (X_train.shape[0], 28, 28), order='F')
An alternative solution is to use the EMNIST python package. (Full details at https://pypi.org/project/emnist/)
This lets you pip install emnist in your environment then import the datasets (they will download when you run the program for the first time).
Example from the site:
>>> from emnist import extract_training_samples
>>> images, labels = extract_training_samples('digits')
>>> images.shape
(240000, 28, 28)
>>> labels.shape
(240000,)
You can also list the datasets
>>> from emnist import list_datasets
>>> list_datasets()
['balanced', 'byclass', 'bymerge', 'digits', 'letters', 'mnist']
And replace 'digits' in the first example with your choice.
This gives you all the data in numpy arrays which I have found makes things easy to work with.
I suggest downloading the 'Binary format as the original MNIST dataset' from the Yann LeCun website.
Unzip the downloaded File and then with Python:
import idx2numpy
X_train = idx2numpy.convert_from_file('./emnist-letters-train-images-idx3-ubyte')
y_train = idx2numpy.convert_from_file('./emnist-letters-train-labels-idx1-ubyte')
X_test = idx2numpy.convert_from_file('./emnist-letters-test-images-idx3-ubyte')
y_test = idx2numpy.convert_from_file('./emnist-letters-test-labels-idx1-ubyte')
Following this tutorial: https://www.tensorflow.org/versions/r1.3/get_started/mnist/pros
I wanted to solve a classification problem with labeled images by myself. Since I'm not using the MNIST database, I spent days creating my own dataset inside tensorflow. It looks like this:
#variables
batch_size = 50
dimension = 784
stages = 10
#step 1 read Dataset
filenames = tf.constant(filenamesList)
labels = tf.constant(labelsList)
#step 2 create Dataset
dataset = tf.data.Dataset.from_tensor_slices((filenames, labels))
#step 3: parse every image in the dataset using `map`
def _parse_function(filename, label):
#convert label to one-hot encoding
one_hot = tf.one_hot(label, stages)
#read image file
image_string = tf.read_file(filename)
image_decoded = tf.image.decode_image(image_string, channels=3)
image = tf.cast(image_decoded, tf.float32)
return image, one_hot
#step 4 final input tensor
dataset = dataset.map(_parse_function)
dataset = dataset.batch(batch_size) #batch_size = 100
iterator = dataset.make_one_shot_iterator()
images, labels = iterator.get_next()
images = tf.reshape(images, [batch_size,dimension]).eval()
labels = tf.reshape(labels, [batch_size,stages]).eval()
for _ in range(10):
dataset = dataset.shuffle(buffer_size = 100)
dataset = dataset.batch(batch_size)
iterator = dataset.make_one_shot_iterator()
images, labels = iterator.get_next()
images = tf.reshape(images, [batch_size,dimension]).eval()
labels = tf.reshape(labels, [batch_size,stages]).eval()
train_step.run(feed_dict={x: images, y_:labels})
Somehow using a higher batch_sizes will break python. What I'm trying to do is to train my neural network with new batches on each iteration. That's why Im also using dataset.shuffle(...). Using dataset.shuffle also breaks my Python.
What I wanted to do (because shuffle breaks) is to batch the whole dataset. By evaluating ('.eval()') I will get a numpy array. I will then shuffle the array with numpy.random.shuffle(images) and then pick up some the first elements to train it.
e.g.
for _ in range(1000):
images = tf.reshape(images, [batch_size,dimension]).eval()
labels = tf.reshape(labels, [batch_size,stages]).eval()
#shuffle
np.random.shuffle(images)
np.random.shuffle(labels)
train_step.run(feed_dict={x: images[0:train_size], y_:labels[0:train_size]})
But then here comes the problem that I can't batch the my whole dataset. It looks like that the data is too big for python to work with.
How should I solve this differently?
Since I'm not using the MNIST database there isn't a function like mnist.train.next_batch(100) which comes handy for me.
Notice how you call shuffle and batch inside your for loop? This is wrong. Datasets in TF work in the style of functional programming, so you are actually defining a pipeline for preprocessing the data to feed into your model. In a way, you give a recipe that answers the question "given this raw data, which operations (map, etc.) should I do to get batches that I can feed into my neural network?"
Now you are modifying that pipeline for every batch! What happens is that the first iteration, the batch size is, say [32 3600]. The next iteration, the elements of this shape are batched again, to [32 32 3600], and so on.
There's a great tutorial on the TF website where you can find out more how Datasets work, but here are a few suggestions how you can resolve your problem.
Move the shuffling to right after "Step 2" in your code. Then you are shuffling the whole dataset so your batches will have a good mixture of examples. Also increase the buffer_size argument, this works in a different way than you probably assume. It's usually a good idea to shuffle as early as possible, as it can be a slow operation if you have a large dataset -- the shuffled part of dataset will have to be read into memory. Here it does not really matter whether you shuffle the filenames and labels, or the read images and labels -- but the latter will have more work to do since the dataset is larger by that time.
Move batching and the iterator generator to be the last steps, just before starting your training loop.
Don't use feed_dict with Dataset iterators to input data into your model. Instead, define your model in terms of the outputs of iterator.get_next() and omit the feed_dict argument. See more details from this Q&A: Tensorflow: create minibatch from numpy array > 2 GB
Ive been getting through a lot of problems with creating tensorflow datasets. So I decided to use OpenCV to import images.
import opencv as cv
imgDataset = []
for i in range(len(files)):
imgDataset.append(cv2.imread(files[i]))
imgDataset = np.asarray(imgDataset)
the shape of imgDataset is (num_img, height, width, col_channels). Getting the i-th image should be imgDataset[i].
shuffling the dataset and getting only batches of it can be done like this:
from sklearn.utils import shuffle
X,y = shuffle(X, y)
X_feed = X[batch_size]
y_feed = y[batch_size]
Then you feed X_feed and y_feed into your model
I am new to Deep Learning and I struggle with some data format on Keras. My CNN is based on the Stacked Hourglass Networks for Human Pose Estimation from A.Newell et al.
On this network the input is a 256x256 RGB image and the output should be a 64x64 heatmap highlighting body joints (shoulder, knee,...). I manage to build the network and I have all the data (images) with their annotations (pixel labels for body joints). I was wondering how should I format the Input and Output Data of the training set to train my model. Currently I use a numpy array (256,256,3) for an image and I don't know how to format my output. Should I create a table [n,64,64,7]? (n being the size of the training set and 7 is the number of filters I use to obtain a heatmap for 7 joints)
Thank you for your time.
The output can also be a numpy array.
Consider this example:
Training set: 50 images of size 256x256x3. This can be combined into a single numpy array of shape(50, 256, 256, 3).
Similar approach to format the output data.
Sample code below:
#a, b and c are arrays of size 256x256x3
import numpy as np
temp = []
temp.append(a)
temp.append(b)
temp.append(c)
output_labels = []
output_labels = np.stack(temp)
The output_labels array will be of shape(3x256x256x3).
Keras recommend to create data generator to feed training data and ground truth to network.
Specific to stacked hourglass network case, you can refer to my implementation for details https://github.com/yuanyuanli85/Stacked_Hourglass_Network_Keras/tree/master/src/data_gen