I'm very new to TensorFlow and Python. I have a dataset, very similar to the MNIST dataset (28 * 28 image). I have been following a lot of the online tutorials on how to implement a basic neural network with tensorflow and found that most of them just use:
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("/tmp/data/", one_hot = True)
Is there a way for me to use my own MNIST-like data instead of importing it from tensorflow? Furthermore, will I still be able to use mnist.train.next_batch with the MNIST-like data? Thank you.
The MNIST dataset used in tensorflow tutorial includes 4 files:
train-images-idx3-ubyte
train-labels-idx1-ubyte
t10k-images-idx3-ubyte
t10k-labels-idx1-ubyte
The first two are training data and training labels; The next two are test data and testing labels. The pixel values/label are stored as byte streams in the file. If your dataset has the exact format as MNIST dataset above, definitely you can use the same approach. The image and label part are read using extract_image and extract_labels method defined here.
Actually it is up to you to store your data in any other format (maybe tf.Example TFRecord file is actually easier). Take a look at the new API too.
Related
There are many guides about loading and splitting MNIST dataset, like this one. They are using libraries such as Keras or Tensorflow.
I would like to load MNIST dataset and splitting in trainig-validation-test set from scratch that is only using built-in python features (and numpy library, if needed).
This is the link to the dataset: MNIST dataset.
Can you help me?
You may look at the source code of Tensorflow or Keras to see how they download it without other libraries.
Here is the relevant piece of code in PyTorch.
It uses this helper code. As far as I can see that code only uses standard libraries. You may reuse their code (BSD-3 Clause License) or read theirs to see what you have to do and then write your own.
Once the data is on your disk and you can load it, there are several options to create a custom train/validate/test split: Python splitting data into random sets
I am attempting to build a multi-input CNN model.
Specifically, the model classifies the words "arigatou, hai,..." into 20 types as shown in the attached image.
into 20 types of words.
This is a good example.
For this purpose, the input format we are assuming is to input images from 4 channels simultaneously.
However, I am having trouble figuring out how to process the image data.
Please let me know if there is a way to use ImageDatageneraror to create training data from a directory structure like an image.
Thank you very much.
sample URL:
https://www.tensorflow.org/api_docs/python/tf/keras/preprocessing/image/ImageDataGenerator?hl=zh-tw
Multiple ImageDataGenerator
All you have to do is make a Pandas Dataframe of all the images paths and then use the ImageDataGenerator on that.
I'm trying to use keras with tensorflow to train a network. I've my own digit dataset of myanmar language. I'm trying to develop myanmar digits recognition using neural network using python. Firstly i want to load dataset from labeled train dataset .csv file. and also to load dataset from unlabeled test dataset .csv file. Problem is how to load these dataset from those dataset file. Please help me in detail explanation.
This is example of loading csv file into dataframe and you can do data engineer part and go through the neural network model.
import pandas as pd
label = ['digit1_','digit2_'....]
for i in label:
train_set = pd.read_csv(i + 'trainset.csv')
test_set = pd.read_csv('testset.csv')
Is this what you mean?
Just do,
train_set = pd.read_csv("train_set.csv")
and similarly for test set as well!
I'm trying to understand how to read local images, use them as TensorFlow Dataset and train Keras model with TF Dataset. I'm following TF Keras MNIST TPU tutorial. The only difference that I want to read my set of images and train on them.
Let's say I have list of images (file names) and corresponding list of labels.
files = [...] # list of file names
labels = [...] # list of labels (integers)
images = tf.constant(files) # or tf.convert_to_tensor(files)
labels = tf.constant(labels) # or tf.convert_to_tensor(labels)
dataset = tf.data.Dataset.from_tensor_slices((images, labels))
dataset = dataset.shuffle(len(files))
dataset = dataset.repeat()
dataset = dataset.map(parse_function).batch(batch_size)
The parse_function is a simple function which reads the input file name and yields the image data and corresponding label, e.g.
def parse_function(filename, label):
image_string = tf.read_file(filename)
image_decoded = tf.image.decode_image(image_string)
image = tf.cast(image_decoded, tf.float32)
return image, label
At this point I have a dataset which is a tf.data.Dataset type (more precisely tf.data.BatchDataset) and I pass it along to keras model trained_model from tutorial, e.g.
history = trained_model.fit(dataset, ...)
But at this point code breaks with the following error:
AttributeError: 'BatchDataset' object has no attribute 'ndim'
The error comes from keras which performs the check on given input like that
from keras import backend as K
K.is_tensor(dataset) # which returns false
Keras tries to determine type of the input and since it is not a tensor it assumes it is numpy array and tries to get its dimension. That's why the error occurs.
My questions here are the following:
am I reading TF dataset correctly? I looked up plenty of examples on internet and it seems I'm reading it as people suggest
why my dataset is not a tensor? may be I need to perform additional conversion, but it is not the case of TF tutorial
why in TF tutorial everything works with tf datasets, I really don't see any difference from they way how they read MNIST data (which is in different data-format, but eventually they get images) and what I'm doing here.
Any suggestion would be greatly appreciated.
Please note, even TF tutorial is about TPUs it is structured such that it works on both TPUs and CPU/GPUs.
Turns out the problem was in using Keras model. The example in TF tutorial relies on Keras model build using tf.keras module (all layers, model, etc. came from tf.keras). While the model I was using (DenseNet) relies on pure keras module, i.e. all layers came from keras module and not from tf.keras. This cause the tf.data.Dataset to be checked for ndim in fit method of keras model. Once I adjusted my DenseNet to use tf.keras layers everything become working again.
I am new to Machine Learning and recently took a courser by Andrew Ng on Coursera.
After that I shifted to Python and used Pandas, Numpy, Sklearn to implement ML algorithms.
Now while surfing I came across tensorFLow and found it pretty amazing, and implemented this example which takes MNIST data as input.
Now I want to read my own custom images and use them for training. I am confused as to how should I convert the images to MNIST sort of data. Or some other way to train my Network.
I took this tutorial to create my network.
Information on the MNIST dataset can be found on Yann LeCun's website.
The TensorFlow module tensorflow.examples.tutorials.mnist.mnist_softmax.py looks to be acquiring/preparing the dataset for the train/test steps.
The MNIST dataset contains an image of a handwritten digit and a corresponding label. If you would like to create labels for a new image, you could use scipy.misc.imread.