Train Keras Model on large number of .mat files - python

I want to train a model using neural network on Python using Keras. The data I have are a bunch of .mat files that contain data associated with ECG signals.
All .mat files have same structure but different values. However there is some differences in some array sizes in the files. For example, one file contain an array called "P_Signal" having a size of 9000, another file have it with a size of 18000.
I want to train a neural network model using Keras. As far as I know I should prepare a CSV file containing all the necessary attributes + label on each column so Keras module can understand it.
Unfortunately, this is impossible due to large arrays and fields inside the .mat files (CSV file can not load more than 16,384 columns and I have some arrays of 18000 each). So I should find another way to load those data in Keras.
Appreciate your help a lot.

Related

Loading images with multiple numeric values for neural network

I have a dataset of images and their two regression values in the CSV file. For example, "img1.jpg" has two numeric values "x_cord" and "y_cord" stored in annotation.csv. I want to train my neural network with images and these two values from CSV file. But I'm not able to load them. Please someone give a solution to load both of them together and give them as input to the neural network.
Many thanks.
I'm not able to load them together. I have tried flow_from_dataframe, but it only takes one numeric value so I don't know how to load multiple numeric values with an image.

How to feed large NumPy arrays to tf.fit()

I have two NumPy arrays saved in .npy file extension. One contains x_train data and other contains y_train data.
The x_train.npy file is 5.7GB of size. I can't feed it to the training by loading the whole array to the memory.
Every time I try to load it to RAM and train the model, Colab crashes before starting the training.
Is there a way to feed large Numpy files to tf.fit()
files I have:
"x_train.npy" 5.7GB
"y_train.npy"
Depending on how much RAM your device has, it may not be possible from a hardware point of view.

How to load flattened 3D data from .dat files in a directory for training a neural network in TensorFlow

I have a data set containing 3-dimensional labeled numerical (float) data stored in two directories named, say, NEGATIVE and POSITIVE, respectively. In each directory, each data point (a vector actually) is written in an individual .dat (or .txt) file. Each original data vector is 3D meaning that it has a shape of, for example, 646464, which is flattened into a 1D array before written in the file (the file then contains one column and 64*64*64 rows).
I want to build a neural network in Python with TensorFlow (Keras) taking these .dat files as inputs (just like taking images but now numerical data instead). However, there doesn't seem to be a function facilitating us to do that. I thought about loading the data files all together and manually reshape them to a training list and a test list containing numpy arrays (each element of the list has shape 64*64*64 -> a single data vector) and also lists of their labels. , but that would require insane amount of RAM (data set is really large). So I wonder if there is a way that I can load them in the form of small batches for the network to train (used in the .fit() method), just like one would do when loading images by 'image_dataset_from_directory'?

Saving custom variables in Keras .h5 file

I'm developing a RNN for a project and I need to train it on a computer and be able to predict on another. The solution I found is to save the model into a .h5 file using the code below:
... # Train the data etc....
model.save("model.h5")
My problem is that I need to store some meta-data from my training dataset and pre-process and be able to load it together with the model. (e.g. name of dataset file, size of the dataset file, number of characters, etc...)
I don't want to store this information in a second file (e.g. a .txt file) because I would have to use two files. I don't want to use any additional library or framework for this task.
I was thinking (brainstorming) a code like this:
model.save("model.h5", metaData={'myVariableName': myVariable})
And to load would be:
myVariable = model.load("model.h5").getMetaData('myVariableName')
I know this is not possible in the current version and I already read Keras doc, but I couldn't find any efficient method to do that. Notice that what I'm asking is different from custom_object because want to save and load my own variables.
Is there a smarter approach to solve this problem?

Does Tensorflow use only one hot encoding to store labels?

I have just started working with Tensorflow, with Caffe it was super practical reading in the data in an efficient manner but with Tensorflow I see that I have to write data loading process myself, creating TFRecords, the batching, the multiple threats, handling those threads etc. So I started with an example, inception v3, as they handle the part to read in the data. I am new to Tensorflow and relatively new to Python, so I feel like I don't understand what is going on with this part exactly (I mean yes it extends the size of the labels to label_index * no of files -but- why? Is it creating one hot encoding for labels? Do we have to? Why doesn't it just extend as much for the length or files as each file have a label? Thx.
labels.extend([label_index] * len(filenames))
texts.extend([text] * len(filenames))
filenames.extend(filenames)
The whole code is here: https://github.com/tensorflow/models/tree/master/research/inception
The part mentioned is under data/build_image_data.py and builds image dataset from an existing dataset as images stored under folders (where foldername is the label): https://github.com/tensorflow/models/blob/master/research/inception/inception/data/build_image_data.py
Putting together what we discussed in the comments:
You have to one-hot encode because the network architecture requires you to, not because it's Tensorflow's demand. The network is a N-class classifier, so the final layer will have one neuron per class and you'll train the network to activate the neuron matching the class the sample belongs to. One-hot encoding the label is the first step in doing this.
About the human-readable labels, the code you're referring to is located in the _find_image_files function, which in turn is used by _process_dataset to transform the dataset from a set of folders to a set TfRecord files, which are a convenient input format type for Tensorflow.
The human-readable label string is included as a feature in the Examples inside the tfrecord files as an 'extra' (probably to simplify visualization of intermediate results during training), it is not strictly necessary for the dataset and will not be used in any way in the actual optimization of the network's parameters.

Categories

Resources