Loading images and eval() in tensorflow are super slow - python

X = []
filelist = gfile.ListDirectory(path_imgs)
for filename in filelist:
path_filename = path_imgs + filename
image_file = file_io.FileIO(path_filename,'rb')
image_raw = image_file.read()
img = tf.image.decode_image(image_raw)
img = tf.image.convert_image_dtype(img, tf.float32)
img = tf.image.resize_image_with_pad(img, img_size, img_size, method=1).eval(session=tf.Session())
X.append(img)
imgs = np.array(X)
Tried some things with session, but didn't work. Probably it should be handled differently, but I don't know how to do it. Any ideas?
EDIT:
Yes, I want to train ANN to segment objects in images.
There are folders with images and their masks. Size is 1000s, and could be 10s of 1000s.
I need single numpy array of images, which will be saved, and used later as dataset for model training.

def getImageData(fileNameList):
imageData=[]
for fn in fileNameList:
testImage = Image.open(fn)
testImage.show()
imageData.append(np.array(testImage))
return np.array(imageData,dtype=np.float32)
imageFn=("dog.png",)
imageData=getImageData(imageFn)
you must import something:
import tensorflow as tf
from PIL import Image
import numpy as np

Related

Setting up data for keras using pandas

How do I read data from a CSV file and turn into my training data and labels?
TRAIN_DATA = "C:\\Users\jackt\Desktop\machine_learning_coursework\MY_TRAIN.csv"
TEST_DATA = "C:\\Users\jackt\Desktop\machine_learning_coursework\MY_LABELS.csv"
train_file_path = tf.keras.utils.get_file("MY_TRAIN.csv", TRAIN_DATA)
test_file_path = tf.keras.utils.get_file("MY_LABELS.csv", TEST_DATA)
np.set_printoptions(precision=3, suppress=True)
MY output error is.
Exception: URL fetch failure on C:\Users\jackt\Desktop\machine_learning_coursework\MY_TRAIN.csv: None -- unknown url type: c
Read csv file, create two empty list, append the image and labels data to respective list, convert it into numpy arrays and save it.
At the time of training, load it using scikit package and shuffle it. There are numerous tutorials for that.
import pandas as pd
from os.path import join
import cv2
import numpy as np
from keras.utils import to_categorical
train_df = pd.read_csv('train.csv')
# Read columns (multiple ways to do it)
images = train_df["images"]
labels = train_df["labels"]
train_data = []
train_label = []
for (image, label) in zip(images, labels):
img = cv2.imread(join("train_dir", image))
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
# if you want to resize images to a particular shape
img = cv2.resize(img, (160, 160), interpolation=cv2.INTER_AREA)
# Append images and labels
train_data.append(img)
train_label.append(label)
train_data = np.array(train_data)
train_label = np.array(train_label)
train_label = to_categorical(train_label)
np.save("train_data.npy", train_data)
np.save("train_label.npy", train_label)

What is solution for this error "ValueError: If evaluating from data tensors, you should specify the `steps` argument"?

I am testing my Deep learning model, I wrote this code
from keras.models import load_model
classifier = load_model('Trained_model.h5')
classifier.evaluate()
Prediction of single image
import numpy as np
from keras.preprocessing import image
img_name = input('Enter Image Name: ')
image_path = './predicting_data/test_set/{}'.format(img_name)
print('')
after running, I am getting this error
ValueError: If evaluating from data tensors, you should specify the `steps` argument.
NOTE :- ./predicting_data/test_Set is the path of my test dataset which has sub folders like A b...c ...to z containing images
The working code to Predict the Class of an Image, by Loading the Saved Model is shown below:
import os
import tensorflow as tf
from tensorflow.keras.preprocessing import image
Test_Dir = '/Dogs_Vs_Cats_Small/test/cats'
New_Model = tf.keras.models.load_model('Dogs_Vs_Cats.h5')
New_Model.summary()
Image_Path = os.path.join(Test_Dir, 'cat.1500.jpg')
Img = image.load_img(Image_Path, target_size = (150,150))
Img_Array = image.img_to_array(Img)
Img_Array = Img_Array/255.0
Img_Array = tf.reshape(Img_Array, (-1,150,150,3))
Predictions = New_Model.predict(Img_Array)
Label = tf.argmax(Predictions)
Label.numpy()[0]
Final line gives the respective Class for our Image.

Tensorflow Pipeline for images and numpy files

I am working with tensorflow 2.0.0 and am trying to setup an efficient pipeline for feeding in ~90,000 png images of size (256, 256, 3) and their labels which are numpy arrays of size (256,256) for an image segmentation problem. These images and labels won't load fully into memory.
The data are stored in a directory like this:
'C:/Users/user/Documents/data/ims/' #png images
'C:/Users/user/Documents/data/masks/' #img labels/masks
The file names are the same save the extension so for example "test1.png" and "test1.npy" are an image/label pair.
The data are not split into training, validation, and test subsets yet.
I need to get to a point in which I have both the images and labels split into train, validation, and testing subsets, and also have a means to feed the data into a model for training.
I was following this guide here but could not figure out how to deal with the numpy files within the get_label function.
I thought I could write a function that splits the data into subsets via file names alone and then on the fly load the batches via the file names provided, but I can't figure out how to do this efficiently.
I'm currently doing this which either doesn't work because the files are too big or too slow because there are some many files to load into memory, either of which isn't a viable solution.
import tensorflow as tf
import numpy as np
import glob2 as glob
from imageio import imread
base = '/mnt/projects/CNN_Data/clean_data/'
image_path = sorted(glob.glob(base + 'ims/*.png'))
label_path = sorted(glob.glob(base + 'masks/*.npy'))
images = [imread(img).astype(np.float32)/255.0 for img in image_path]
labels = [np.load(path) for path in label_path]
Edit to add:
Here was my attempt following the tensorflow example that I linked above. It runs, but I can't get get_label to what I want.
import tensorflow as tf
import numpy as np
import os
AUTOTUNE = tf.data.experimental.AUTOTUNE
base = '/mnt/projects/CNN_Data/clean_data/'
list_ds = tf.data.Dataset.list_files(base + 'ims/*')
def get_label(file_path):
parts = tf.strings.split(file_path, os.path.sep)
parts[-2] == 'masks'
fname = tf.strings.split(parts[-1], '.')[0]
fname = tf.strings.join([fname, '.npy'])
parts[-1] == fname
return parts
def decode_img(img):
img = tf.image.decode_png(img, channels = 3)
img = tf.image.convert_image_dtype(img, tf.float32)
return img
def process_path(file_path):
label = get_label(file_path)
img = tf.io.read_file(file_path)
img = decode_img(img)
return img, label
labeled_ds = list_ds.map(process_path, num_parallel_calls=AUTOTUNE)

Why would this dataset implementation run out of memory?

I follow this instruction and write the following code to create a Dataset for images(COCO2014 training set)
from pathlib import Path
import tensorflow as tf
def image_dataset(filepath, image_size, batch_size, norm=True):
def preprocess_image(image):
image = tf.image.decode_jpeg(image, channels=3)
image = tf.image.resize(image, image_size)
if norm:
image /= 255.0 # normalize to [0,1] range
return image
def load_and_preprocess_image(path):
image = tf.read_file(path)
return preprocess_image(image)
all_image_paths = [str(f) for f in Path(filepath).glob('*')]
path_ds = tf.data.Dataset.from_tensor_slices(all_image_paths)
ds = path_ds.map(load_and_preprocess_image, num_parallel_calls=tf.data.experimental.AUTOTUNE)
ds = ds.shuffle(buffer_size = len(all_image_paths))
ds = ds.repeat()
ds = ds.batch(batch_size)
ds = ds.prefetch(tf.data.experimental.AUTOTUNE)
return ds
ds = image_dataset(train2014_dir, (256, 256), 4, False)
image = ds.make_one_shot_iterator().get_next('images')
# image is then fed to the network
This code will always run out of both memory(32G) and GPU(11G) and kill the process. Here is the messages shown on terminal.
I also spot that the program get stuck at sess.run(opt_op). Where is wrong? How can I fix it?
The problem is this:
ds = ds.shuffle(buffer_size = len(all_image_paths))
The buffer that Dataset.shuffle() uses is an 'in memory' buffer so you are effectively trying to load the whole dataset in memory.
You have a couple of options (which you can combine) to fix this:
Option 1:
Reduce the buffer size to a much smaller number.
Option 2:
Move the shuffle() statment before the map() statement.
This means we would be shuffling before we load the images therefore we'd just be storing the filenames in the memory buffer for the shuffle rather than storing huge tensors.

valueerror: can't reshape array of size 315 into shape (32,32)

I'm trying to use the code in this page: https://medium.com/#muskulpesent/create-numpy-array-of-images-fecb4e514c4b
import cv2
import glob
import numpy as np
#Train data
train = []
train_labels = []
files = glob.glob (r"C:\Users\Downloads\All_Codes\image\0\*.png") # your image path
for myFile in files:
image = cv2.imread (myFile ,cv2.IMREAD_GRAYSCALE)
input_img_resize=cv2.resize(image,(64,64))
train.append (input_img_resize)
train_labels.append([0])
print(train)
print(len(train))
files = glob.glob (r"C:\Users\Downloads\All_Codes\image\1\*.png")
for myFile in files:
image = cv2.imread (myFile,cv2.IMREAD_GRAYSCALE)
print(image)
#input_img_resize=cv2.resize(image,(64,64))
train.append (image)
train_labels.append([1])
print(len(train_labels))
print(train_labels)
train = np.array(train,dtype=object) #as mnist
train_labels = np.array(train_labels,dtype=object) #as mnist
# convert (number of images x height x width x number of channels) to (number of images x (height * width *3))
# for example (120 * 40 * 40 * 3)-> (120 * 4800)
train = np.reshape(train,(train.shape[0],64,64))
# save numpy array as .npy formats
np.save('train',train)
np.save('train_labels',train_labels)
But I had some errors. The problem is that I get the same error every time I attempt to read my images and reshaping them using np.reshape. I searched a lot and used so many codes. They are all the same. That I can't shape (the number of images in my dataset) to (32, 32) which is the shape I want to insert to my CNN model. The only thing I know for sure is the images in my dataset are of different shapes. Is this why I'm having a diffculty in reshaping them? then what's the point of using "resize" and "reshape"?
the first error is:
ValueError: cannot reshape array of size 315 into shape (315,32,32)
for this line:
train = np.reshape(train,[train.shape[0],32,32])
So, I solved the problem.
import cv2
import glob
import numpy as np
import PIL.Image
#Train data
train = []
train_labels = []
files = glob.glob (r"\train\0\*.png") # your image path
for myFile in files:
image = cv2.imread (myFile ,cv2.IMREAD_GRAYSCALE)
input_img_resize=cv2.resize(image,(64,64))
train.append (input_img_resize)
train_labels.append([0])
#print(train)
#print(len(train))
files = glob.glob (r"\train\1\*.png")
for myFile in files:
image = cv2.imread (myFile,cv2.IMREAD_GRAYSCALE)
input_img_resize=cv2.resize(image,(64,64))
#print(input_img_resize)
train.append (input_img_resize)
train_labels.append([1])
print(len(train))
print(len(train_labels))
train = np.array(train,dtype="float32") #as mnist
train_labels = np.array(train_labels,dtype="float32") #as mnist
train = np.reshape(train,(-1,64,64,1))
I resized my images using cv2.resize (inside the loop)
Then did a reshape using np.reshape.
If I relied on one of them, it dose not work. I have to add them both.
The output is:
315 #len for x and y
315
(315, 64, 64) #after cv2.resize
(315, 1)
(315, 64, 64, 1) #after np.reshape
(315, 1)

Categories

Resources