I want to train a simple classification neural network which can classify the data into 2 types, i.e. true or false.
I have 29 data along with respective labels available with me. I want to parse this data to form a dataset which can be fed into model.fit() to train the neural network.
Please suggest me how can I arrange the data with their respective labels. What to use, whether lists, dictionary, array?
There are values of 2 fingerprints separated by '$' sign and whether they match or not (i.e. true or false) is separated by another '$' sign.
A Fingerprint has 63 features separated by ','(comma) sign.
So, Each line has the data of 2 fingerprints and true/false data.
I have below data with me in following format:
File Name : thumb_and_index.txt
239,1,255,255,255,255,2,0,130,3,1,105,24,152,0,192,126,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,128,0,192,0,192,0,0,0,0,0,0,0,147,18,19,0,0,0,0,0,$239,1,255,255,255,255,2,0,130,3,1,101,22,154,0,240,30,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,128,0,0,0,0,0,0,0,0,0,0,0,0,71,150,212,$true
239,1,255,255,255,255,2,0,130,3,1,82,23,146,0,128,126,0,14,0,6,0,6,0,2,0,0,0,0,0,2,0,2,0,2,0,2,0,2,0,6,128,6,192,14,224,30,255,254,0,0,0,0,0,0,207,91,180,0,0,0,0,0,$239,1,255,255,255,255,2,0,130,3,1,81,28,138,0,241,254,128,6,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,128,0,128,2,128,2,192,6,224,6,224,62,0,0,0,0,0,0,0,0,0,0,0,0,13,62,$true
239,1,255,255,255,255,2,0,130,3,1,92,29,147,0,224,0,192,0,192,0,128,0,128,0,128,0,128,0,128,0,128,0,128,0,192,0,192,0,224,0,224,2,240,2,248,6,255,14,76,16,0,0,0,0,19,235,73,181,0,0,0,0,$239,192,255,255,255,255,2,0,130,3,1,0,0,0,0,248,30,240,14,224,0,224,0,128,0,128,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,6,128,14,192,14,252,30,0,0,0,0,0,0,0,0,0,0,0,0,158,46,$false
239,1,255,255,255,255,2,0,130,3,1,0,0,0,0,128,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,128,0,0,0,0,0,0,0,217,85,88,0,0,0,0,0,$239,1,255,255,255,255,2,0,130,3,1,90,27,135,0,252,254,224,126,128,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,190,148,$false
239,1,255,255,255,255,2,0,130,3,1,89,22,129,0,129,254,128,254,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,2,0,2,0,6,0,6,128,14,192,14,224,14,0,0,0,0,0,0,20,20,43,0,0,0,0,0,$239,1,255,255,255,255,2,0,130,3,1,91,17,134,0,0,126,0,30,0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,2,0,2,0,6,0,6,0,30,192,62,224,126,224,254,0,0,0,0,0,0,0,0,0,0,0,0,138,217,$true
239,1,255,255,255,255,2,0,130,3,1,71,36,143,0,128,254,0,14,0,14,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,2,0,2,0,2,0,6,80,18,0,0,0,0,153,213,11,95,83,0,0,0,$239,1,255,255,255,255,2,0,130,3,1,94,30,140,0,129,254,0,14,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,192,6,0,0,0,0,0,0,0,0,0,0,0,0,54,13,$true
239,1,255,255,255,255,2,0,130,3,1,66,42,135,0,255,254,1,254,0,14,0,6,0,6,0,6,0,6,0,6,0,2,0,2,0,2,0,2,0,2,0,2,0,6,0,6,0,6,0,0,0,0,0,0,225,165,64,152,172,88,0,0,$239,1,255,255,255,255,2,0,130,3,1,62,29,137,0,255,254,249,254,240,6,224,2,224,0,224,0,224,0,224,0,224,0,224,0,224,0,240,0,240,0,240,0,240,0,240,0,240,2,0,0,0,0,0,0,0,0,0,0,0,0,0,98,$true
239,1,255,255,255,255,2,0,130,3,1,83,31,142,0,255,254,128,254,0,30,0,14,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,128,2,192,2,192,2,192,2,192,6,0,0,0,0,0,0,146,89,117,12,0,0,0,0,$239,1,255,255,255,255,2,0,130,3,1,84,14,154,0,0,2,0,2,0,2,0,2,0,2,0,6,0,14,128,30,192,62,255,254,255,254,255,254,255,254,255,254,255,254,255,254,255,254,0,0,0,0,0,0,0,0,0,0,0,0,0,31,$false
239,1,255,255,255,255,2,0,130,3,1,66,41,135,0,255,254,248,62,128,30,0,14,0,14,0,14,0,14,0,14,0,14,0,6,0,6,0,6,0,14,0,14,0,14,192,14,224,14,0,0,0,0,0,0,105,213,155,107,95,23,0,0,$239,1,255,255,255,255,2,0,130,3,1,61,33,133,0,255,254,255,254,224,62,192,6,192,6,192,6,192,6,192,6,192,6,224,6,224,6,224,6,224,6,224,6,224,6,224,6,224,6,0,0,0,0,0,0,0,0,0,0,0,0,0,62,$false
239,1,255,255,255,255,2,0,130,3,1,88,31,119,0,0,14,0,14,0,6,0,6,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,100,133,59,150,0,0,0,0,$239,1,255,255,255,255,2,0,130,3,1,97,21,137,0,128,14,0,6,0,2,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,6,0,0,0,0,0,0,0,0,0,0,0,80,147,210,$true
239,1,255,255,255,255,2,0,130,3,1,85,21,137,0,224,14,192,6,192,6,128,6,0,6,0,6,0,6,0,6,0,6,0,6,0,6,0,6,0,6,128,14,192,30,224,126,224,254,0,0,0,0,0,0,79,158,178,0,0,0,0,0,$239,1,255,255,255,255,2,0,130,3,1,89,25,134,0,240,6,128,2,0,2,0,2,0,2,0,2,0,2,0,2,0,2,0,2,128,2,128,2,192,2,192,6,224,6,240,14,240,30,0,0,0,0,0,0,0,0,0,0,0,0,72,31,$true
239,1,255,255,255,255,2,0,130,3,1,90,25,128,0,241,254,0,30,0,6,0,2,0,2,0,2,0,2,0,2,0,2,0,2,0,2,0,2,0,2,0,2,0,6,0,6,192,14,0,0,0,0,0,0,225,153,189,0,0,0,0,0,$239,1,255,255,255,255,2,0,130,3,1,96,12,153,0,192,14,128,6,128,6,128,6,0,6,128,2,128,2,128,2,128,6,128,6,192,14,240,30,255,254,255,254,255,254,255,254,255,254,0,0,0,0,0,0,0,0,0,0,0,0,0,18,$false
239,1,255,255,255,255,2,0,130,3,1,96,22,142,0,255,254,254,14,128,2,128,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,192,2,0,0,0,0,0,0,18,25,100,0,0,0,0,0,$239,1,255,255,255,255,2,0,130,3,1,76,24,145,0,224,2,192,0,128,0,128,0,128,0,128,0,128,0,128,0,128,0,224,2,240,126,255,254,255,254,255,254,255,254,255,254,255,254,0,0,0,0,0,0,0,0,0,0,0,0,0,145,$false
239,1,255,255,255,255,2,0,130,3,1,71,33,117,0,129,254,0,30,0,14,0,14,0,6,0,6,0,2,0,2,0,6,0,6,0,6,0,6,0,6,128,14,192,14,240,30,240,254,0,0,0,0,0,0,235,85,221,57,17,0,0,0,$239,1,255,255,255,255,2,0,130,3,1,76,31,112,0,255,254,0,62,0,62,0,62,0,14,0,6,0,6,0,6,0,6,0,6,0,6,0,6,0,6,0,6,0,6,128,14,224,62,0,0,0,0,0,0,0,0,0,0,0,0,30,170,$true
239,1,255,255,255,255,2,0,130,3,1,64,29,117,0,128,30,0,30,0,30,0,14,0,6,0,6,0,6,0,6,0,6,0,14,0,14,0,14,128,30,192,30,224,62,240,254,255,254,0,0,0,0,0,0,99,80,119,149,0,0,0,0,$239,1,255,255,255,255,2,0,130,3,1,72,18,132,0,128,2,0,0,0,0,128,0,128,0,128,0,128,0,192,2,224,2,240,14,252,14,255,254,255,254,255,254,255,254,255,254,255,254,0,0,0,0,0,0,0,0,0,0,0,0,0,14,$false
239,1,255,255,255,255,2,0,130,3,1,82,16,132,0,255,254,255,254,255,254,240,30,224,14,224,14,192,6,192,6,192,2,192,2,192,2,192,2,192,2,192,2,192,1,224,2,240,6,0,0,0,0,0,0,215,21,0,0,0,0,0,0,$239,1,255,255,255,255,2,0,130,3,1,85,23,130,0,240,30,192,14,128,14,128,6,128,2,128,2,128,2,128,2,128,2,128,0,192,0,192,2,192,2,224,2,224,6,240,6,248,30,0,0,0,0,0,0,0,0,0,0,0,0,0,62,$true
239,1,255,255,255,255,2,0,130,3,1,100,28,141,0,255,254,255,254,224,14,192,14,192,6,192,2,128,2,128,2,128,2,0,2,0,2,0,2,0,2,0,6,0,6,0,6,192,14,0,0,0,0,0,0,42,88,87,169,0,0,0,0,$239,1,255,255,255,255,2,0,130,3,1,95,31,134,0,255,254,240,254,224,0,192,0,192,0,192,0,128,0,128,0,128,0,128,0,128,0,128,0,128,0,128,0,128,0,192,2,192,6,0,0,0,0,0,0,0,0,0,0,0,0,0,182,$true
239,1,255,255,255,255,2,0,130,3,1,88,35,121,0,255,14,240,6,224,7,192,2,192,2,192,2,192,2,192,2,192,2,192,2,192,2,224,2,224,2,224,2,224,2,224,2,224,6,0,0,0,0,0,0,36,81,48,225,153,0,0,0,$239,1,255,255,255,255,2,0,130,3,1,81,43,112,0,252,62,248,14,224,2,192,2,192,2,192,0,192,0,192,0,192,0,192,0,192,0,192,0,224,0,224,2,224,2,224,2,224,6,0,0,0,0,0,0,0,0,0,0,0,0,0,76,$true
239,1,255,255,255,255,2,0,130,3,1,103,24,144,0,255,254,192,14,192,6,128,2,128,0,0,0,0,0,0,0,0,0,0,0,0,2,0,2,0,6,128,6,128,6,192,30,224,254,0,0,0,0,0,0,19,82,111,0,0,0,0,0,$239,1,255,255,255,255,2,0,130,3,1,98,11,149,0,255,2,255,0,252,0,240,0,240,0,240,0,248,0,248,0,248,0,252,0,254,0,254,2,254,30,254,30,254,30,254,30,254,30,0,0,0,0,0,0,0,0,0,0,0,0,0,114,$false
239,1,255,255,255,255,2,0,130,3,1,92,23,123,0,255,254,255,30,252,6,240,2,224,0,192,0,192,0,192,0,224,0,224,0,224,0,224,2,224,2,224,2,224,2,224,6,224,6,0,0,0,0,0,0,35,161,251,0,0,0,0,0,$239,1,255,255,255,255,2,0,130,3,1,52,37,125,0,255,254,255,254,224,254,192,30,192,14,128,14,128,14,128,14,128,14,128,14,128,14,128,14,128,6,0,2,0,2,0,2,192,2,0,0,0,0,0,0,0,0,0,0,0,0,0,110,$false
239,1,255,255,255,255,2,0,130,3,1,103,19,143,0,255,254,254,254,0,126,0,126,0,126,0,62,0,62,0,126,0,126,0,126,0,126,0,126,0,126,0,126,0,254,0,254,0,254,0,0,0,0,0,0,38,168,0,0,0,0,0,0,$239,1,255,255,255,255,2,0,130,3,1,90,30,141,0,255,254,193,254,128,62,0,6,0,2,0,2,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,6,0,254,0,0,0,0,0,0,0,0,0,0,0,0,53,211,$true
239,1,255,255,255,255,2,0,130,3,1,93,34,137,0,255,254,225,254,192,14,192,2,192,2,192,2,192,2,192,0,192,0,192,0,192,0,192,0,192,0,224,2,224,2,240,6,240,14,0,0,0,0,0,0,101,4,252,164,28,0,0,0,$239,1,255,255,255,255,2,0,130,3,1,88,31,140,0,255,254,192,62,192,14,192,14,0,6,0,6,0,6,0,6,0,2,0,2,0,2,0,2,128,2,128,6,192,6,224,14,240,30,0,0,0,0,0,0,0,0,0,0,0,0,10,97,$true
239,1,255,255,255,255,2,0,130,3,1,57,50,107,0,248,2,248,0,248,0,224,0,224,0,192,0,192,0,192,0,128,0,128,0,128,0,128,0,192,0,192,0,192,0,192,2,224,2,0,0,0,0,0,0,34,10,146,27,176,73,73,82,$239,1,255,255,255,255,2,0,130,3,1,54,42,111,0,255,254,255,254,254,126,252,6,240,2,224,2,224,2,224,0,224,0,224,0,224,0,224,0,224,0,224,0,224,0,192,0,192,0,0,0,0,0,0,0,0,0,0,0,0,0,0,225,$true
239,1,255,255,255,255,2,0,130,3,1,103,18,142,0,241,254,224,254,128,126,128,126,0,62,0,30,0,30,0,14,0,14,0,14,0,14,0,14,0,14,0,14,0,14,0,14,0,14,0,0,0,0,0,0,209,21,0,0,0,0,0,0,$239,1,255,255,255,255,2,0,130,3,1,103,10,139,0,255,254,255,254,255,254,225,254,192,254,192,254,192,126,128,62,0,30,0,14,0,14,0,14,0,14,0,14,0,14,0,14,0,14,0,0,0,0,0,0,0,0,0,0,0,0,0,163,$true
239,1,255,255,255,255,2,0,130,3,1,85,21,132,0,248,2,248,2,248,0,240,0,240,0,240,0,240,0,240,0,240,0,240,0,248,0,248,0,252,0,252,0,252,0,254,2,255,6,0,0,0,0,0,0,94,23,110,0,0,0,0,0,$239,1,255,255,255,255,2,0,130,3,1,76,26,133,0,129,254,128,62,0,62,0,62,0,62,0,62,0,30,0,30,0,30,0,30,0,30,0,30,0,30,0,30,128,30,192,14,224,14,0,0,0,0,0,0,0,0,0,0,0,0,222,36,$true
239,1,255,255,255,255,2,0,130,3,1,87,28,141,0,255,254,255,254,224,254,224,126,224,126,0,14,0,2,0,2,0,2,0,0,0,0,0,0,0,2,0,2,0,2,0,2,0,2,0,0,0,0,0,0,143,231,78,148,0,0,0,0,$239,1,255,255,255,255,2,0,130,3,1,89,30,139,0,255,254,248,254,240,30,224,14,224,14,192,6,192,2,128,0,0,0,0,0,0,0,0,0,0,0,0,2,0,2,0,2,0,2,0,0,0,0,0,0,0,0,0,0,0,0,26,213,$true
239,1,255,255,255,255,2,0,130,3,1,93,25,136,0,255,254,193,254,0,254,0,62,0,30,0,30,0,14,0,14,0,14,0,14,0,14,0,14,0,14,0,14,0,14,0,14,0,14,0,0,0,0,0,0,148,210,91,0,0,0,0,0,$239,1,255,255,255,255,2,0,130,3,1,95,23,145,0,254,254,252,30,240,2,224,0,224,0,224,0,192,0,192,0,192,0,192,6,192,6,192,6,192,6,192,6,192,6,224,6,224,14,0,0,0,0,0,0,0,0,0,0,0,0,0,30,$false
239,1,255,255,255,255,2,0,130,3,1,85,27,138,0,255,254,240,126,224,30,192,14,0,14,0,14,0,14,0,14,0,14,0,14,0,14,0,14,0,30,0,30,0,30,192,62,224,62,0,0,0,0,0,0,85,17,74,101,0,0,0,0,$239,1,255,255,255,255,2,0,130,3,1,105,19,144,0,192,254,128,126,0,62,0,30,128,30,128,30,128,14,192,14,192,14,192,14,224,14,224,14,240,14,240,14,248,14,254,30,255,30,0,0,0,0,0,0,0,0,0,0,0,0,0,254,$false
239,1,255,255,255,255,2,0,130,3,1,86,37,116,0,255,254,254,14,252,6,248,2,240,0,240,0,224,0,192,0,192,0,128,0,0,0,0,2,0,2,0,2,0,2,0,6,0,6,0,0,0,0,0,0,94,157,90,28,219,0,0,0,$239,1,255,255,255,255,2,0,130,3,1,99,26,130,0,255,254,248,14,240,2,224,0,192,0,192,0,192,0,128,0,192,0,192,0,192,0,192,0,224,0,240,2,248,6,255,254,255,254,0,0,0,0,0,0,0,0,0,0,0,0,0,213,$true
I have used this code trying to parse the data:
import tensorflow as tf
import os
import array as arr
import numpy as np
import json
os.environ["TF_CPP_MIN_LOG_LEVEL"]="2"
f= open("thumb_and_index.txt","r")
dataset = []
if f.mode == 'r':
contents =f.read()
#list of lines
lines = contents.splitlines()
print("No. of lines : "+str(len(lines)))
for line in lines:
words = line.split(',')
mainlist = []
list = []
flag = 0
for word in words:
print("word : " + word)
if '$' in word:
if flag == 1:
mainlist.append(list)
mainlist.append(word[1:])
dataset.append(mainlist)
else:
mainlist.append(list)
del list[0:len(list)]
list.append(int(word[1:]))
flag = flag + 1
else:
list.append(int(word))
print(json.dumps(dataset, indent = 4))
I want to feed the parsed data into model.fit() using keras in tensorflow(python).
Also I want to ask about the neural network. How many layers and nodes should I keep in my neural network? Suggest a starting point.
there's a plenty ways to do that (formating the data), you can create 2D matrix for the data that has 62 columns for the data and another array that handles the results for this data (X_data,Y_data).
also you can use pandas to create dataframes for the data (same as arrays, bu it's better to show and visualize the data).
example to read the textfile into pandas dataframe
import pandas
df = pandas.read_table('./input/dists.txt', delim_whitespace=True, names=('A', 'B', 'C'))
split the data into x&y then fit it in your model
for the size of the hidden layers in your neural, it's well known that the more layers you add the more accurate results you get (without considering overfitting) , so that depends on your data.
I suggest you to start with a sequential layers as follows (62->2048->1024->512->128->64->sigmoid)
The best approach, especially assuming that dataset is large, is to use the tf.data dataset. There's a CSV reader built right in. The dataset api provides all the functionality you need to preprocess the dataset, it provides built-in multi-core processing, and quite a bit more.
Once you have the dataset built Keras will accept it as an input directly, so fit(my_dataset, inputs=... outputs=...).
The structure of the dataset api takes a little learning, but it's well worth it. Here's the primary guide with lots of examples:
https://www.tensorflow.org/guide/datasets
Scroll down to the section on 'Import CSV data' for poignant examples.
Here's a nice example of using the dataset API with keras: How to Properly Combine TensorFlow's Dataset API and Keras?