Setting up data for keras using pandas - python

How do I read data from a CSV file and turn into my training data and labels?
TRAIN_DATA = "C:\\Users\jackt\Desktop\machine_learning_coursework\MY_TRAIN.csv"
TEST_DATA = "C:\\Users\jackt\Desktop\machine_learning_coursework\MY_LABELS.csv"
train_file_path = tf.keras.utils.get_file("MY_TRAIN.csv", TRAIN_DATA)
test_file_path = tf.keras.utils.get_file("MY_LABELS.csv", TEST_DATA)
np.set_printoptions(precision=3, suppress=True)
MY output error is.
Exception: URL fetch failure on C:\Users\jackt\Desktop\machine_learning_coursework\MY_TRAIN.csv: None -- unknown url type: c

Read csv file, create two empty list, append the image and labels data to respective list, convert it into numpy arrays and save it.
At the time of training, load it using scikit package and shuffle it. There are numerous tutorials for that.
import pandas as pd
from os.path import join
import cv2
import numpy as np
from keras.utils import to_categorical
train_df = pd.read_csv('train.csv')
# Read columns (multiple ways to do it)
images = train_df["images"]
labels = train_df["labels"]
train_data = []
train_label = []
for (image, label) in zip(images, labels):
img = cv2.imread(join("train_dir", image))
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
# if you want to resize images to a particular shape
img = cv2.resize(img, (160, 160), interpolation=cv2.INTER_AREA)
# Append images and labels
train_data.append(img)
train_label.append(label)
train_data = np.array(train_data)
train_label = np.array(train_label)
train_label = to_categorical(train_label)
np.save("train_data.npy", train_data)
np.save("train_label.npy", train_label)

Related

What is solution for this error "ValueError: If evaluating from data tensors, you should specify the `steps` argument"?

I am testing my Deep learning model, I wrote this code
from keras.models import load_model
classifier = load_model('Trained_model.h5')
classifier.evaluate()
Prediction of single image
import numpy as np
from keras.preprocessing import image
img_name = input('Enter Image Name: ')
image_path = './predicting_data/test_set/{}'.format(img_name)
print('')
after running, I am getting this error
ValueError: If evaluating from data tensors, you should specify the `steps` argument.
NOTE :- ./predicting_data/test_Set is the path of my test dataset which has sub folders like A b...c ...to z containing images
The working code to Predict the Class of an Image, by Loading the Saved Model is shown below:
import os
import tensorflow as tf
from tensorflow.keras.preprocessing import image
Test_Dir = '/Dogs_Vs_Cats_Small/test/cats'
New_Model = tf.keras.models.load_model('Dogs_Vs_Cats.h5')
New_Model.summary()
Image_Path = os.path.join(Test_Dir, 'cat.1500.jpg')
Img = image.load_img(Image_Path, target_size = (150,150))
Img_Array = image.img_to_array(Img)
Img_Array = Img_Array/255.0
Img_Array = tf.reshape(Img_Array, (-1,150,150,3))
Predictions = New_Model.predict(Img_Array)
Label = tf.argmax(Predictions)
Label.numpy()[0]
Final line gives the respective Class for our Image.

Tensorflow Pipeline for images and numpy files

I am working with tensorflow 2.0.0 and am trying to setup an efficient pipeline for feeding in ~90,000 png images of size (256, 256, 3) and their labels which are numpy arrays of size (256,256) for an image segmentation problem. These images and labels won't load fully into memory.
The data are stored in a directory like this:
'C:/Users/user/Documents/data/ims/' #png images
'C:/Users/user/Documents/data/masks/' #img labels/masks
The file names are the same save the extension so for example "test1.png" and "test1.npy" are an image/label pair.
The data are not split into training, validation, and test subsets yet.
I need to get to a point in which I have both the images and labels split into train, validation, and testing subsets, and also have a means to feed the data into a model for training.
I was following this guide here but could not figure out how to deal with the numpy files within the get_label function.
I thought I could write a function that splits the data into subsets via file names alone and then on the fly load the batches via the file names provided, but I can't figure out how to do this efficiently.
I'm currently doing this which either doesn't work because the files are too big or too slow because there are some many files to load into memory, either of which isn't a viable solution.
import tensorflow as tf
import numpy as np
import glob2 as glob
from imageio import imread
base = '/mnt/projects/CNN_Data/clean_data/'
image_path = sorted(glob.glob(base + 'ims/*.png'))
label_path = sorted(glob.glob(base + 'masks/*.npy'))
images = [imread(img).astype(np.float32)/255.0 for img in image_path]
labels = [np.load(path) for path in label_path]
Edit to add:
Here was my attempt following the tensorflow example that I linked above. It runs, but I can't get get_label to what I want.
import tensorflow as tf
import numpy as np
import os
AUTOTUNE = tf.data.experimental.AUTOTUNE
base = '/mnt/projects/CNN_Data/clean_data/'
list_ds = tf.data.Dataset.list_files(base + 'ims/*')
def get_label(file_path):
parts = tf.strings.split(file_path, os.path.sep)
parts[-2] == 'masks'
fname = tf.strings.split(parts[-1], '.')[0]
fname = tf.strings.join([fname, '.npy'])
parts[-1] == fname
return parts
def decode_img(img):
img = tf.image.decode_png(img, channels = 3)
img = tf.image.convert_image_dtype(img, tf.float32)
return img
def process_path(file_path):
label = get_label(file_path)
img = tf.io.read_file(file_path)
img = decode_img(img)
return img, label
labeled_ds = list_ds.map(process_path, num_parallel_calls=AUTOTUNE)

Trouble feeding data into tensorflow graph

I have trained a neural network model on MNIST dataset using the script mnist_3.1_convolutional_bigger_dropout.py provided in this tutorial.
I wanted to test the trained model on the custom dataset, hence I wrote a small script predict.py which loads the trained model and feed the data to it. I tried 2 methods for preprocessing images so that they are compatible with MNIST format.
Method 1: Resizing the image to 28x28
Method 2: Technique mentioned here is used
Both of these methods result in the error
InvalidArgumentError (see above for traceback): You must feed a value for placeholder tensor 'Placeholder_2' with dtype float
predict.py
# Importing libraries
from scipy.misc import imread
import tensorflow as tf
import numpy as np
import cv2 as cv
import glob
from test import imageprepare
files = glob.glob('data2/*.*')
#print(files)
# Method 1
'''
img_data = []
for fl in files:
img = imageprepare(fl)
img = img.reshape(img.shape[0], img.shape[1], 1)
img_data.append(img)
'''
# Method 2
dig_cont = [cv.imread(fl, 0) for fl in files]
#print(len(dig_cont))
img_data = []
for i in range(len(dig_cont)):
img = cv.resize(dig_cont[i], (28, 28))
img = img.reshape(img.shape[0], img.shape[1], 1)
img_data.append(img)
print("Restoring Model ...")
sess = tf.Session()
# Step-1: Recreate the network graph. At this step only graph is created.
tf_saver = tf.train.import_meta_graph('model/model.meta')
# Step-2: Now let's load the weights saved using the restore method.
tf_saver.restore(sess, tf.train.latest_checkpoint('model'))
print("Model restored")
x = tf.get_default_graph().get_tensor_by_name('X:0')
print('x :', x.shape)
y = tf.get_default_graph().get_tensor_by_name('Y:0')
print('y :', y.shape)
dict_data = {x: img_data}
result = sess.run(y, feed_dict=dict_data)
print(result)
print(result.shape)
sess.close()
The problem is fixed, I forgot to pass the value of variable pkeep. I had to make the following changes to make it work.
dict_data = {x: img_data, pkeep: 1.0}
instead of
dict_data = {x: img_data}

valueerror: can't reshape array of size 315 into shape (32,32)

I'm trying to use the code in this page: https://medium.com/#muskulpesent/create-numpy-array-of-images-fecb4e514c4b
import cv2
import glob
import numpy as np
#Train data
train = []
train_labels = []
files = glob.glob (r"C:\Users\Downloads\All_Codes\image\0\*.png") # your image path
for myFile in files:
image = cv2.imread (myFile ,cv2.IMREAD_GRAYSCALE)
input_img_resize=cv2.resize(image,(64,64))
train.append (input_img_resize)
train_labels.append([0])
print(train)
print(len(train))
files = glob.glob (r"C:\Users\Downloads\All_Codes\image\1\*.png")
for myFile in files:
image = cv2.imread (myFile,cv2.IMREAD_GRAYSCALE)
print(image)
#input_img_resize=cv2.resize(image,(64,64))
train.append (image)
train_labels.append([1])
print(len(train_labels))
print(train_labels)
train = np.array(train,dtype=object) #as mnist
train_labels = np.array(train_labels,dtype=object) #as mnist
# convert (number of images x height x width x number of channels) to (number of images x (height * width *3))
# for example (120 * 40 * 40 * 3)-> (120 * 4800)
train = np.reshape(train,(train.shape[0],64,64))
# save numpy array as .npy formats
np.save('train',train)
np.save('train_labels',train_labels)
But I had some errors. The problem is that I get the same error every time I attempt to read my images and reshaping them using np.reshape. I searched a lot and used so many codes. They are all the same. That I can't shape (the number of images in my dataset) to (32, 32) which is the shape I want to insert to my CNN model. The only thing I know for sure is the images in my dataset are of different shapes. Is this why I'm having a diffculty in reshaping them? then what's the point of using "resize" and "reshape"?
the first error is:
ValueError: cannot reshape array of size 315 into shape (315,32,32)
for this line:
train = np.reshape(train,[train.shape[0],32,32])
So, I solved the problem.
import cv2
import glob
import numpy as np
import PIL.Image
#Train data
train = []
train_labels = []
files = glob.glob (r"\train\0\*.png") # your image path
for myFile in files:
image = cv2.imread (myFile ,cv2.IMREAD_GRAYSCALE)
input_img_resize=cv2.resize(image,(64,64))
train.append (input_img_resize)
train_labels.append([0])
#print(train)
#print(len(train))
files = glob.glob (r"\train\1\*.png")
for myFile in files:
image = cv2.imread (myFile,cv2.IMREAD_GRAYSCALE)
input_img_resize=cv2.resize(image,(64,64))
#print(input_img_resize)
train.append (input_img_resize)
train_labels.append([1])
print(len(train))
print(len(train_labels))
train = np.array(train,dtype="float32") #as mnist
train_labels = np.array(train_labels,dtype="float32") #as mnist
train = np.reshape(train,(-1,64,64,1))
I resized my images using cv2.resize (inside the loop)
Then did a reshape using np.reshape.
If I relied on one of them, it dose not work. I have to add them both.
The output is:
315 #len for x and y
315
(315, 64, 64) #after cv2.resize
(315, 1)
(315, 64, 64, 1) #after np.reshape
(315, 1)

Loading images and eval() in tensorflow are super slow

X = []
filelist = gfile.ListDirectory(path_imgs)
for filename in filelist:
path_filename = path_imgs + filename
image_file = file_io.FileIO(path_filename,'rb')
image_raw = image_file.read()
img = tf.image.decode_image(image_raw)
img = tf.image.convert_image_dtype(img, tf.float32)
img = tf.image.resize_image_with_pad(img, img_size, img_size, method=1).eval(session=tf.Session())
X.append(img)
imgs = np.array(X)
Tried some things with session, but didn't work. Probably it should be handled differently, but I don't know how to do it. Any ideas?
EDIT:
Yes, I want to train ANN to segment objects in images.
There are folders with images and their masks. Size is 1000s, and could be 10s of 1000s.
I need single numpy array of images, which will be saved, and used later as dataset for model training.
def getImageData(fileNameList):
imageData=[]
for fn in fileNameList:
testImage = Image.open(fn)
testImage.show()
imageData.append(np.array(testImage))
return np.array(imageData,dtype=np.float32)
imageFn=("dog.png",)
imageData=getImageData(imageFn)
you must import something:
import tensorflow as tf
from PIL import Image
import numpy as np

Categories

Resources