I'm here asking a general question about image processing applied to a machine learning pipeline. In this post, I will refer to ML as every algorithm that is not deep learning (therefore it doesn't use a neural network).
I'm developing a classifier to catalog different clothes .png images. I have labels (for each image I know the category) so it's a supervised learning problem.
My objective is to use PCA to reduce the problem's dimensionality and then use bag of visual words to perform the classification. I'm using python for this project.
The problem is that each photo has a different size and a different ratio between width and height (therefore I can't only resize them because I wouldn't have a unique height value for each image).
My, inelegant, solution is to fix the width at 200 px and then pad a bunch of zeros rows to each image (each image is a NumPy array of maximum_h rows and each row is width long).
Here the script:
#help function to convert images in array
def get_image(image_path: str, resize=True, w=300):
"""
:param image_path: string, path of the image
:param resize: boolean, if True the image is resized. Default: True
:param w: integer, specify the width of the resized image
:return: numpy array of the greyscale version of the image
"""
try:
image = Image.open(image_path).convert("L")
if resize:
wpercent = (w/float(image.size[0]))
hsize = int((float(image.size[1])*float(wpercent)))
image = image.resize((w,hsize), Image.ANTIALIAS)
#pixel_values = np.array(image.getdata())
return image
except:
#AI19/04442.png corrupted
#AI18/02971.png corrupted
#print(image_path)
return None
def extract_images(paths:list, categories: list, w: int, maximum_h: int):
A = np.zeros([len(paths), w * maximum_h])
y = []
counter = 0
for image_path, label in tqdm(zip(paths, categories)):
im = get_image(image_path, w=w)
if im:
#adapt images to fit
h,w = np.array(im).shape
delta_h = maximum_h-h
zeros_ = np.zeros((delta_h, w), dtype=int)
im = np.concatenate((im, zeros_), axis=0)
A[counter, :] = im.reshape(1, -1)
y.append(label)
counter += 1
else:
continue
return (A,y)
The problem here is the classifier performs badly (20%) because I add a significant amount of zeros to each image that increases the dimensionality but doesn't add information.
Looking at the biggest eigenvectors of the PCA algorithm I see that a lot of information is concentrated in these "padding" area (and this confirm my impression).
Is there a better way to handle different size images in python?
Related
I was trying to create a neural network to distinguish forest from other land in satellite images.
I started analysing the images but I'm not sure not sure how to normalize the pixel values.
I thought to divide each pixel value by 255 but in an example made by bnsreenu i found this part
from sklearn.preprocessing import MinMaxScaler, StandardScaler
scaler = MinMaxScaler()
root_directory = 'Semantic segmentation dataset/'
patch_size = 256
#Read images from repsective 'images' subdirectory
#As all images are of ddifferent size we have 2 options, either resize or crop
#But, some images are too large and some small. Resizing will change the size of real objects.
#Therefore, we will crop them to a nearest size divisible by 256 and then
#divide all images into patches of 256x256x3.
image_dataset = []
for path, subdirs, files in os.walk(root_directory):
#print(path)
dirname = path.split(os.path.sep)[-1]
if dirname == 'images': #Find all 'images' directories
images = os.listdir(path) #List of all image names in this subdirectory
for i, image_name in enumerate(images):
if image_name.endswith(".jpg"): #Only read jpg images...
image = cv2.imread(path+"/"+image_name, 1) #Read each image as BGR
SIZE_X = (image.shape[1]//patch_size)*patch_size #Nearest size divisible by our patch size
SIZE_Y = (image.shape[0]//patch_size)*patch_size #Nearest size divisible by our patch size
image = Image.fromarray(image)
image = image.crop((0 ,0, SIZE_X, SIZE_Y)) #Crop from top left corner
#image = image.resize((SIZE_X, SIZE_Y)) #Try not to resize for semantic segmentation
image = np.array(image)
#Extract patches from each image
print("Now patchifying image:", path+"/"+image_name)
patches_img = patchify(image, (patch_size, patch_size, 3), step=patch_size) #Step=256 for 256 patches means no overlap
for i in range(patches_img.shape[0]):
for j in range(patches_img.shape[1]):
single_patch_img = patches_img[i,j,:,:]
#Use minmaxscaler instead of just dividing by 255.
single_patch_img = scaler.fit_transform(single_patch_img.reshape(-1, single_patch_img.shape[-1])).reshape(single_patch_img.shape)
#single_patch_img = (single_patch_img.astype('float32')) / 255.
single_patch_img = single_patch_img[0] #Drop the extra unecessary dimension that patchify adds.
image_dataset.append(single_patch_img)
In this example he uses a minmaxscaler that give different values compared as diving by 255.
What method is better or more adapt to the situation?
I'll leave the link below:
github repo with full code
MinMaxScaler indeed may produce different values rather than simple division by 255 (In case, there are no pixels with intensities 0 or 255). As official scikit-learn documentation say, it performs the following transformation:
X_std = (X - X.min(axis=0)) / (X.max(axis=0) - X.min(axis=0))
X_scaled = X_std * (max - min) + min
where max, min - desirable values.
Therefore, normalalizing data is rather data (and probably model) specific operation. The division by 255 is a most common way to do so and for the many cases it's enough to do. As you use neural network, you can check answers to this question to learn more about why you should normilize/center your data.
def preprocess(self):
# Import image
pic1 = self.path
raw_image = cv2.imread(pic1)
#cv2.imshow('Raw image',raw_image)
#cv2.waitKey(0)
# Resize image
dim = (320,180)
resized = cv2.resize(raw_image, dim)
#cv2.imshow('Resized Image',resized)
#cv2.waitKey(0)
# Scale image
scaled = cv2.normalize(resized, None, alpha=-1, beta=1, norm_type=cv2.NORM_MINMAX,dtype=cv2.CV_32F)
#cv2.imshow('Scaled Image',scaled)
#cv2.waitKey(0)
return scaled
I'm trying to scale the pixel values of "raw_image" to within the range -1 to 1 as part of a pre-process for identifying an object using machine learning. Essentially, a camera takes a picture, resizes and scales the image to the same size as the images within a dataset used for training and validating. Then that image is inferred by the model generated using model.fit() to detect what the object in the image actually is.
The question here is: " Is this scaling function correct for putting the pixel values in the range of -1 to 1?" It appears SUPER dark when I use cv2.imshow and I'm afraid the model isn't recognizing it properly.
I have about 1000 images in a cvs file. I have already managed to put those images in my Python programm by doing the following:
df = pd.read_csv("./Testes_small.csv")
# Creates the dataframe
training_set = pd.DataFrame({'Images': training_imgs,'Labels': training_labels})
train_dataGen = ImageDataGenerator(rescale=1./255)
train_generator = train_dataGen.flow_from_dataframe(dataframe = training_set, directory="",
x_col="Images", y_col="Labels",
class_mode="categorical",
target_size=(224, 224),batch_size=32)
##Steps to plot the images
imgs,labels = next(train_generator)
for i in range(batch_size): # range de 0 a 31
image = imgs[i]
plt.imshow(image)
plt.show()
So now I have the train_generator variable of type python.keras.preprocessing.image.DataframeIterator Its size is (32,224,224,3).
In the function ImageDataGenerator I want to put my own preprocessing function to resize the images. I want to do this because I have some rectangular images that when resized lose its ratio.
Per examples these images before(upper image) and after(the lower one) resizing:
Clearly the secong image loses shape
I found this function(it's the answer to a previous thread):
def resize_image(self, image: Image, length: int) -> Image:
"""
Resize an image to a square. Can make an image bigger to make it fit or smaller if it doesn't fit. It also crops
part of the image.
:param self:
:param image: Image to resize.
:param length: Width and height of the output image.
:return: Return the resized image.
"""
"""
Resizing strategy :
1) We resize the smallest side to the desired dimension (e.g. 1080)
2) We crop the other side so as to make it fit with the same length as the smallest side (e.g. 1080)
"""
if image.size[0] < image.size[1]:
# The image is in portrait mode. Height is bigger than width.
# This makes the width fit the LENGTH in pixels while conserving the ration.
resized_image = image.resize((length, int(image.size[1] * (length / image.size[0]))))
# Amount of pixel to lose in total on the height of the image.
required_loss = (resized_image.size[1] - length)
# Crop the height of the image so as to keep the center part.
resized_image = resized_image.crop(
box=(0, required_loss / 2, length, resized_image.size[1] - required_loss / 2))
# We now have a length*length pixels image.
return resized_image
else:
# This image is in landscape mode or already squared. The width is bigger than the heihgt.
# This makes the height fit the LENGTH in pixels while conserving the ration.
resized_image = image.resize((int(image.size[0] * (length / image.size[1])), length))
# Amount of pixel to lose in total on the width of the image.
required_loss = resized_image.size[0] - length
# Crop the width of the image so as to keep 1080 pixels of the center part.
resized_image = resized_image.crop(
box=(required_loss / 2, 0, resized_image.size[0] - required_loss / 2, length))
# We now have a length*length pixels image.
return resized_image
I'm trying to insert it like this img_datagen = ImageDataGenerator(rescale=1./255, preprocessing_function = resize_image but it doesn't work because I'm not giving an im. Do you have any ideas on how can I do this?
Check the documentation for providing custom functions to the ImageDataGenerator. It says and quote,
"preprocessing_function: function that will be applied on each input. The function will run after the image is resized and augmented. The function should take one argument: one image (Numpy tensor with rank 3), and should output a Numpy tensor with the same shape."
From the above documentation we can note the following:
When will this function be executed:
after resizing image.
after any data augmentation which has to be done.
Function argument requirements:
only one argument.
this argument is for only one numpy image.
the image should be numpy tensor of rank 3
Function output requirements:
one output image.
should be same shape as input
This last point is really important for your question. Since your function is resizing the image it's output will not be the same shape as input and so you cannot do this directly.
One alternative to get this done is to do your resizing of dataset before passing to ImageDataGenerator.
I have a bundle of different images with birds but some of them contains feathers, slightly visible birds etc. I need to find images where the bird visible well and remove images with feathers, far distant birds, etc.
I already tried ORB, simple template matching, Canny edge detection. And I cannot use neural nets.
Now i try with such algorithm:
Binarize template image to get shapes
Slide window over another binarized image with sliding window and calculate matchShape with template in every window
Find best match
As you can see this method gives me strange result
Binary template
.
Shape on the other binary image, for example:
I calculated matchShapes in different parts of this image and the best result ~ 0.05 I got in this part:
which is obviously not similar to original shape.
Code for sliding window:
import cv2
OFFSET = 5
SCALE_RATIO = [0.5, 1]
def get_scaled_list(img_path, template):
matcher_list = []
img = cv2.imread(img_path)
#JUST BINARIZATION AND RESIZING
img = preprocess(resize_image(img))
height, width = img.shape
# building size of scale window
for scaler in SCALE_RATIO:
x_point = 0
y_point = 0
x1_point = int(width * scaler)
y1_point = x1_point
if x1_point > height:
y1_point = height
while y1_point <= height:
while x1_point <= width:
img1 = img[y_point:y1_point, x_point:x1_point]
#Comparing template and part of image
diff = cv2.matchShapes(template, img1, cv2.CONTOURS_MATCH_I1, 0)
data_tuple = (img_path, x_point, y_point, int(width * scaler), diff)
matcher_list.append(data_tuple)
x_point += OFFSET
x1_point += OFFSET
x_point = 0
x1_point = int(width * scaler)
y_point += OFFSET
y1_point += OFFSET
return matcher_list
How can I perform correct shape matching and why is the best result performs here?
The naive window sliding method with a rigid template will work very poorly. In particular, the sizes are different making correct overlap impossible.
What you are trying to achieve is difficult because you only have edge information and the edges are complex, broken in several independent arcs and with junctions.
You can find many solutions when you have a single closed curve (lookup "elastic contour matching" for instance), but not for your case. This would be a case of "approximate elastic graph matching".
Other possible approaches are by special distance functions such as the chamfer or Hausdorff distances, but you can still be stuck because of the size mismatch.
I am trying to make a tensorflow dataset api(tf version 1.8) for a set of images which are of different sizes. To do this, I am extracting patches of same size from the images and feeding it to my neural net.
The problem is in tf.extract_patches_from_images, the patches get stored in the channel dimension. As each image is of different size, number of patches are different for each image. Hence the shape of each resulting image is different. Hence I can't batch them together using tf dataset api.
Can someone suggest changes in my following modify_image function to tackle the issue?
I guess separating the patches into different images and then batching them together would work. But I can't understand how to do that.
I want to scan the whole image, hence randomly selecting equal number of patches won't work for me.
def modify_image(image):
'''add preprocessing functions here'''
image = tf.expand_dims(image,0)
image = tf.extract_image_patches(
image,
ksizes=[1,patch_size,patch_size,1],
strides=[1,patch_size,patch_size,1],
rates=[1,1,1,1],
padding='SAME',
name=None
)
image = tf.reshape(image,shape=[-1,patch_size,patch_size,1])
return image;
def parse_function(image,labels):
image= tf.read_file(image)
image = tf.image.decode_image(image)
labels = tf.read_file(labels)
labels = tf.image.decode_image(labels)
image = modify_image(image)
labels = modify_image(labels)
return image,labels
def list_files(directory):
files = glob.glob(directory)
return files
def load_dataset(img_dir,labels_dir):
images = list_files(img_dir)
images = tf.constant(images)
labels = list_files(labels_dir)
labels = tf.constant(labels)
dataset = tf.data.Dataset.from_tensor_slices((images,labels))
dataset = dataset.map(parse_function)
return dataset
def make_batches(home_dir,img_dir,labels_dir,batch_size):
img_dir = home_dir + img_dir
labels_dir = home_dir +labels_dir
dataset = load_dataset(img_dir,labels_dir)
batched_dataset = dataset.batch(batch_size)
return batched_dataset
The tf.contrib.data.unbatch() transformation might be helpful here, as it can separate the patches from a single image into different elements:
dataset = tf.data.Dataset.from_tensor_slices((images,labels))
dataset = dataset.map(parse_function)
patches_dataset = dataset.apply(tf.contrib.data.unbatch())
batched_dataset = dataset.batch(batch_size)
Note that for tf.contrib.data.unbatch() to work, the number of patches in an image must match the number of elements/rows in labels. For example, if each patch should get the same label, you could achieve this by modifying parse_function() as follows to tf.tile() the labels an appropriate number of times:
def parse_function(images, labels):
# ...
return image, tf.tile([labels], tf.shape(image)[0:1])