loading dataset from csv file for keras in python - python

I'm trying to use keras with tensorflow to train a network. I've my own digit dataset of myanmar language. I'm trying to develop myanmar digits recognition using neural network using python. Firstly i want to load dataset from labeled train dataset .csv file. and also to load dataset from unlabeled test dataset .csv file. Problem is how to load these dataset from those dataset file. Please help me in detail explanation.

This is example of loading csv file into dataframe and you can do data engineer part and go through the neural network model.
import pandas as pd
label = ['digit1_','digit2_'....]
for i in label:
train_set = pd.read_csv(i + 'trainset.csv')
test_set = pd.read_csv('testset.csv')
Is this what you mean?

Just do,
train_set = pd.read_csv("train_set.csv")
and similarly for test set as well!

Related

how to Import a .mat data set into python and divide it into train and test

I have done a pre-processing on a data set(EEG-fNIRS) with EEG-lab which is available "https://drive.google.com/drive/folders/1fCQM5PTvy69yhtFaKryCeJiQRC7huO1f?usp=sharing"
Now I am trying to do the Deep learning classification, but I need to firstly convert it into image and then divide it into the train and test parts, I would appreciate if you help me.

Load and prepare a new dataset

I'm using tf to create a sentiment analysis model. Since I'm a noob of machine learning I followed a guide on the official documentation of Tensorflow to train and test a model with the IMDB_reviews dataset. It works pretty well but I wish I could train it with another dataset.
So I've downloaded this dataset: "movie_review.csv". It contains various columns and I want to access text and tag (where the tag is a positive or negative value and text is the text of the review).
What I want to do is to prepare the CSV as a dataset, access text and tag, vectorize them, and feed them to the network. There is no division between test and train, so I have to divide the file too.
So, I want to know how to:
0- Access the file I've downloaded and transform it into a dataset.
1- Access text and tag in the file, maybe without using pandas. If pandas is recommended and there is a simple way to access the file and passing to a network using TensorFlow I'll be okay with the answer.
2- Splitting the file in the test set and train set (I've already found a pandas solution for this actually).
3- Vectorize my text and tag to feed my network.
If you have an entire guide on how to do this, it'll be fine, it just has to use TensorFlow.
Questions 0 to 3 have been answered
Ok so, I have used the file posted to load a dataset to train the model on short sentences, but I'm having trouble with the training.
When I followed the guide to build the model for text classification I came out with this code:
dataset, info = tfds.load('imdb_reviews/subwords8k', with_info=True, as_supervised=True)
train_dataset, test_dataset = dataset['train'], dataset['test']
encoder = info.features['text'].encoder
BUFFER_SIZE = 10000
BATCH_SIZE = 64
padded_shapes = ([None], ())
train_dataset = train_dataset.shuffle(BUFFER_SIZE).padded_batch(BATCH_SIZE, padded_shapes = padded_shapes)
test_dataset = test_dataset.padded_batch(BATCH_SIZE, padded_shapes = padded_shapes)
model = tf.keras.Sequential([tf.keras.layers.Embedding(encoder.vocab_size, 64),
tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(64)),
tf.keras.layers.Dense(64, activation='relu'),
tf.keras.layers.Dense(1, activation='sigmoid')])
model.compile(loss='binary_crossentropy', optimizer=tf.keras.optimizers.Adam(1e-4),
metrics=['accuracy'])
history = model.fit(train_dataset, epochs = 1, validation_data = test_dataset, validation_steps=30, callbacks=[cp_callback])
So, I trained my model this way (Some parts are missing, I have included all the fundamental ones). After this, I wanted to train the model with another dataset, and thanks to Andrew I have accessed a dataset created by me this way:
csv_dataset = tf.data.experimental.CsvDataset(filepath, default_values, header=header)
def reshape_dataset(txt, tag):
txt = tf.reshape(txt, shape=(1,))
tag = tf.reshape(tag, shape=(1,))
return txt, tag
csv_dataset = csv_dataset.map(reshape_dataset)
training = csv_dataset.take(10)
testing = csv_dataset.skip(10)
And my problem is to adapt the dataset to the model I already have. I have tried various solution, but I get errors on the shapes.
Can somebody be so gentle to explain me how to do this? Obviously the solution for step 3 has already been posted by Andrew in his file, but I'd like to use my model with the weights I have saved during training.
This sounds like a great place to use Tensorflow's Dataset API. Here's a notebook/tutorial that covers how to do some basic data input and preprocessing stuff, right from Tensorflow's website!
I have also made a notebook with a quick example, answering each of your questions with implementations. You can find that here.

Loading tfrecord into Keras through ImageDataGenerator class

I am fairly new to keras and I am trying transfer learning here:
https://www.tensorflow.org/tutorials/images/transfer_learning
My dataset however is not a binary and I have tfrecord file. I can read the file in tensorflow. I do not want to feed the images as an input to the network as the input comes from the pre-trained model. How can I pass the images and labels in the ImageDataGenerator class in Keras.
For anyone that may have this issue in the future. If the pre-train process is all correct. You can use the tf.data API to read and prepare the images for the training and the (image, label) set, can be fed to to the (.fit) method of your model.
look at this great Post to get familiar how to read the tfrecord file:
https://medium.com/#moritzkrger/speeding-up-keras-with-tfrecord-datasets-5464f9836c36

Using my own data in tensorflow for neuralnetwork implementation

I'm very new to TensorFlow and Python. I have a dataset, very similar to the MNIST dataset (28 * 28 image). I have been following a lot of the online tutorials on how to implement a basic neural network with tensorflow and found that most of them just use:
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("/tmp/data/", one_hot = True)
Is there a way for me to use my own MNIST-like data instead of importing it from tensorflow? Furthermore, will I still be able to use mnist.train.next_batch with the MNIST-like data? Thank you.
The MNIST dataset used in tensorflow tutorial includes 4 files:
train-images-idx3-ubyte
train-labels-idx1-ubyte
t10k-images-idx3-ubyte
t10k-labels-idx1-ubyte
The first two are training data and training labels; The next two are test data and testing labels. The pixel values/label are stored as byte streams in the file. If your dataset has the exact format as MNIST dataset above, definitely you can use the same approach. The image and label part are read using extract_image and extract_labels method defined here.
Actually it is up to you to store your data in any other format (maybe tf.Example TFRecord file is actually easier). Take a look at the new API too.

Processing Image To Feed data in to Convolutional Neural Network

I am new to Machine Learning and recently took a courser by Andrew Ng on Coursera.
After that I shifted to Python and used Pandas, Numpy, Sklearn to implement ML algorithms.
Now while surfing I came across tensorFLow and found it pretty amazing, and implemented this example which takes MNIST data as input.
Now I want to read my own custom images and use them for training. I am confused as to how should I convert the images to MNIST sort of data. Or some other way to train my Network.
I took this tutorial to create my network.
Information on the MNIST dataset can be found on Yann LeCun's website.
The TensorFlow module tensorflow.examples.tutorials.mnist.mnist_softmax.py looks to be acquiring/preparing the dataset for the train/test steps.
The MNIST dataset contains an image of a handwritten digit and a corresponding label. If you would like to create labels for a new image, you could use scipy.misc.imread.

Categories

Resources