Loading images with multiple numeric values for neural network - python

I have a dataset of images and their two regression values in the CSV file. For example, "img1.jpg" has two numeric values "x_cord" and "y_cord" stored in annotation.csv. I want to train my neural network with images and these two values from CSV file. But I'm not able to load them. Please someone give a solution to load both of them together and give them as input to the neural network.
Many thanks.
I'm not able to load them together. I have tried flow_from_dataframe, but it only takes one numeric value so I don't know how to load multiple numeric values with an image.

Related

Convolutional neural network (CNN) on decimal values

I have a lot of csv file containing approximately 1000 rows and 2 columns where the data looks like this:
21260.35679 0.008732499
21282.111 0.008729349
21303.86521 0.008721652
21325.61943 0.008708224
These two are the features where the output will be a device name. Each csv file is data from a specific device of different times and there are also many devices. What I am trying to do is train the data and then classify the device name using CNN. If there is any incoming data outside of the trained observation, it should be classified as anomaly.
I am trying to convert those values to image matrix so that I can use CNN to train this data. But I what I am concerned about is, the second columns contains value less than 1 or and close to zero and the value is also float. If I convert it to integer it becomes zero and if all the values becomes zero then it doesn't make any sense.
How to solve this? And is it even possible to use CNN on these datasets?
From your description, your problem seems to be a sequence classification.
You have many temporal sequences. Each sequence has the same quantity of 2D elements and is associated to a device. Given a sequence as input, you want to predict the corresponding device.
This kind of temporal dependencies are better captured by RNNs. I would suggest giving a look at LSTM .

How to load flattened 3D data from .dat files in a directory for training a neural network in TensorFlow

I have a data set containing 3-dimensional labeled numerical (float) data stored in two directories named, say, NEGATIVE and POSITIVE, respectively. In each directory, each data point (a vector actually) is written in an individual .dat (or .txt) file. Each original data vector is 3D meaning that it has a shape of, for example, 646464, which is flattened into a 1D array before written in the file (the file then contains one column and 64*64*64 rows).
I want to build a neural network in Python with TensorFlow (Keras) taking these .dat files as inputs (just like taking images but now numerical data instead). However, there doesn't seem to be a function facilitating us to do that. I thought about loading the data files all together and manually reshape them to a training list and a test list containing numpy arrays (each element of the list has shape 64*64*64 -> a single data vector) and also lists of their labels. , but that would require insane amount of RAM (data set is really large). So I wonder if there is a way that I can load them in the form of small batches for the network to train (used in the .fit() method), just like one would do when loading images by 'image_dataset_from_directory'?

Structure of data for multilabel Classification problem

I am working on a prediction problem where a user is having access to multiple Target and each access is having separate row. Below is the data
df=pd.DataFrame({"ID":[12567,12567,12567,12568,12568],"UnCode":[LLLLLLL,LLLLLLL,LLLLLLL,KKKKKK,KKKKKK],
"CoCode":[1000,1000,1000,1111,1111],"CatCode":[1,1,1,2,2],"RoCode":["KK","KK","KK","MM","MM"],"Target":[12,4,6,1,6]
})
**Here ID is unique but can be repeated if user has accessed multiple targets and target can be repeated as well if accessed by different ID's**
I have converted this data to OHE and used for prediction using binary relevance, where my X is constant and target is varying.
Problem I am facing with this approach is the data becomes sparse and number of features in my original data are around 1300.
Can someone suggest me whether this approach is correct or not and what other methods/approach I can use in this type of problem. Also is this problem can be treated as multilabel classification?
Below is the input data for model

Train Keras Model on large number of .mat files

I want to train a model using neural network on Python using Keras. The data I have are a bunch of .mat files that contain data associated with ECG signals.
All .mat files have same structure but different values. However there is some differences in some array sizes in the files. For example, one file contain an array called "P_Signal" having a size of 9000, another file have it with a size of 18000.
I want to train a neural network model using Keras. As far as I know I should prepare a CSV file containing all the necessary attributes + label on each column so Keras module can understand it.
Unfortunately, this is impossible due to large arrays and fields inside the .mat files (CSV file can not load more than 16,384 columns and I have some arrays of 18000 each). So I should find another way to load those data in Keras.
Appreciate your help a lot.

Does Tensorflow use only one hot encoding to store labels?

I have just started working with Tensorflow, with Caffe it was super practical reading in the data in an efficient manner but with Tensorflow I see that I have to write data loading process myself, creating TFRecords, the batching, the multiple threats, handling those threads etc. So I started with an example, inception v3, as they handle the part to read in the data. I am new to Tensorflow and relatively new to Python, so I feel like I don't understand what is going on with this part exactly (I mean yes it extends the size of the labels to label_index * no of files -but- why? Is it creating one hot encoding for labels? Do we have to? Why doesn't it just extend as much for the length or files as each file have a label? Thx.
labels.extend([label_index] * len(filenames))
texts.extend([text] * len(filenames))
filenames.extend(filenames)
The whole code is here: https://github.com/tensorflow/models/tree/master/research/inception
The part mentioned is under data/build_image_data.py and builds image dataset from an existing dataset as images stored under folders (where foldername is the label): https://github.com/tensorflow/models/blob/master/research/inception/inception/data/build_image_data.py
Putting together what we discussed in the comments:
You have to one-hot encode because the network architecture requires you to, not because it's Tensorflow's demand. The network is a N-class classifier, so the final layer will have one neuron per class and you'll train the network to activate the neuron matching the class the sample belongs to. One-hot encoding the label is the first step in doing this.
About the human-readable labels, the code you're referring to is located in the _find_image_files function, which in turn is used by _process_dataset to transform the dataset from a set of folders to a set TfRecord files, which are a convenient input format type for Tensorflow.
The human-readable label string is included as a feature in the Examples inside the tfrecord files as an 'extra' (probably to simplify visualization of intermediate results during training), it is not strictly necessary for the dataset and will not be used in any way in the actual optimization of the network's parameters.

Categories

Resources