DL image preparation - resizing

DL image preparation - resizing - python

I've got some tasks about classification and Object ROI.
So I got images and labels like class and x1,y2,x2,y2 (standard box)
But images are different in sizes, is there some solution to get box coordinates after resizing?
So what i mean - i got image 300 px H and 400 px W and box coordinates (x1,y1,x2,y2). Before train my Dl model - i have to resize all images to the same W and H, for example I choose 200*200, so is there some solution to calculate new box coordinates x1new_after_resizing, y1new_after_resizing, x2new_after_resizing,y2new_after_resizing?
And are there some tips about what H and H to choose for resizing? Mean of all images? Median?
Thanks!

If you want to get new coordinates from image size of orig_width and orig_height to new_width and new_height, you can use scale the box coordinates in the following way
width_scaled = new_width/orig_width
height_scaled = new_height/orig_height
x1_new = x1*width_scaled
y1_new = y1*height_scaled
x2_new = x2*width_scaled
y2_new = y2*height_scaled
You can plot these coordinates on the new image and check if you would like
There is no fixed method on how to choose the dimension of resizing images. It depends on various factors like the network, the GPU memory you have, batch size, and the shape of the smallest/largest image in the dataset. Ideally, it should not be too small/stretched out such that the images are incomprehensible or extremely stretched out
You can refer to this post to get an idea of image resizing

Related

How to treat different size images in ML pipeline

I'm here asking a general question about image processing applied to a machine learning pipeline. In this post, I will refer to ML as every algorithm that is not deep learning (therefore it doesn't use a neural network).
I'm developing a classifier to catalog different clothes .png images. I have labels (for each image I know the category) so it's a supervised learning problem.
My objective is to use PCA to reduce the problem's dimensionality and then use bag of visual words to perform the classification. I'm using python for this project.
The problem is that each photo has a different size and a different ratio between width and height (therefore I can't only resize them because I wouldn't have a unique height value for each image).
My, inelegant, solution is to fix the width at 200 px and then pad a bunch of zeros rows to each image (each image is a NumPy array of maximum_h rows and each row is width long).
Here the script:
#help function to convert images in array
def get_image(image_path: str, resize=True, w=300):
"""
:param image_path: string, path of the image
:param resize: boolean, if True the image is resized. Default: True
:param w: integer, specify the width of the resized image
:return: numpy array of the greyscale version of the image
"""
try:
image = Image.open(image_path).convert("L")
if resize:
wpercent = (w/float(image.size[0]))
hsize = int((float(image.size[1])*float(wpercent)))
image = image.resize((w,hsize), Image.ANTIALIAS)
#pixel_values = np.array(image.getdata())
return image
except:
#AI19/04442.png corrupted
#AI18/02971.png corrupted
#print(image_path)
return None
def extract_images(paths:list, categories: list, w: int, maximum_h: int):
A = np.zeros([len(paths), w * maximum_h])
y = []
counter = 0
for image_path, label in tqdm(zip(paths, categories)):
im = get_image(image_path, w=w)
if im:
#adapt images to fit
h,w = np.array(im).shape
delta_h = maximum_h-h
zeros_ = np.zeros((delta_h, w), dtype=int)
im = np.concatenate((im, zeros_), axis=0)
A[counter, :] = im.reshape(1, -1)
y.append(label)
counter += 1
else:
continue
return (A,y)
The problem here is the classifier performs badly (20%) because I add a significant amount of zeros to each image that increases the dimensionality but doesn't add information.
Looking at the biggest eigenvectors of the PCA algorithm I see that a lot of information is concentrated in these "padding" area (and this confirm my impression).
Is there a better way to handle different size images in python?

Extracting same point from images

Hope you guys are doing well.
I have a question about opencv and extracting the same point from number of images.
Say, we have a dataset of around 100 images. (maybe more but for this purpose it will suffice).
Image will look something like:
As you can see in the image, there is an area marked in RED. I have marked that using Paint for this purpose. It indicates the highest point of the heap of soil. All the 100 images we have look more or less the same (without that crane in the background. But it can be removed using some opencv techniques, so that is not an issue). But this heap of soil can be either on the left hand side or the right hand side. According to its position, the coordinate of the highest point in the heap will change.
So, my question is, how to find this position given that the heap can be either on left or right side?
Note that this position can be relative to some object (for example in this image, midpoint of the crane) or if the images are of different size than we can resize the images to have same dimensions and take the coordinates of the point w.r.t the image itself.
How do we find out the highest point of the heap though? Should we manually go through each image, label that point and make a dataset with the images and bounding boxes? Or is there another decent soulution to this?
Also, if the soil heap is labelled manually (Like shading the required area i.e. heap of an image) using Paint or some other software, would that help too? But I cannot think of anything to do after this.
Thank You.

So I am sure this can be done in a better way than my answer, but here goes:
"Also, if the soil heap is labelled manually (Like shading the required area i.e. heap of an image) using Paint or some other software, would that help too? But I cannot think of anything to do after this."
In regards to that particular statement if you mark out the region of interest in a distinctive color and shape, like you have done in the above example. You can use opencv to detect that particular region of interest and its coordinates within the image.

I think the best solution is deep learning because detective always has different backgrounds. You can use Faster rcnn, or if you want speed, you can make nice detectives with a good training using Yolo algorithm. You can find Github repo easily. The mathematics of work is described in these links.
Faster RCNN https://arxiv.org/abs/1506.01497
Yolo https://pjreddie.com/darknet/yolo/

basically you can resize image. Keep aspect ratio!
def image_resize(image, width = None, height = None, inter = cv2.INTER_AREA):
dim = None
(h, w) = image.shape[:2]
if width is None and height is None:
return image
if width is None:
# calculate the ratio of the height and construct the
# dimensions
r = height / float(h)
dim = (int(w * r), height)
else:
# calculate the ratio of the width and construct the
# dimensions
r = width / float(w)
dim = (width, int(h * r))
# resize the image
resized = cv2.resize(image, dim, interpolation = inter)
# return the resized image
return resized
image = image_resize(image, height = ..What you want..)

Extracting separate images from YOLO bounding box coordinates

I have a set of images and their corresponding YOLO coordinates. Now I want to extract the objects that these YOLO coordinates denote into separate images.
But these coordinates are in floating point notation and hence am not able to use splicing.
This is an image Sample Image and the corresponding YOLO coordinates are
labels = [0.536328, 0.5, 0.349219, 0.611111]
I read my image as follows :
image = cv2.imread('frame0.jpg')
Then I wanted to use something like image[y:y+h,x:x+w] as I had seen in a similar question. But the variables are float, so I tried to convert them into integers using the dimensions of the image 1280 x 720 like this :
object = [int(label[0]*720), int(label[1]*720), int(label[2]*1280), int(label[3]*1280)]
x,y,w,h = object
But it doesn't get the part of the image correctly as you can see over here extractedImage
This is part of my training dataset, so I had cropped these parts earlier using some tools, so there would not be any errors in my labels. Also all the images are incorrectlly cropped this way, I have shown the output for 1 of the images.
Thanks a lot in advance. Any suggestions would be really helpful !

The labels need to be normalized differently - since the x and y are with respect to the center of the screen, they're actually multiplied by W/2 and H/2, respectively. Also, the width and height dimensions have to be multiplied by W and H, respectively - they're currently both being normalized by the W (1280). Here's how I solved it:
import cv2
import matplotlib.pyplot as plt
label = [0.536328, 0.5, 0.349219, 0.611111]
img = cv2.imread('P6A4J.jpg')
H, W, _ = img.shape
object = [int(label[0]*W/2), int(label[1]*H/2), int(label[2]*W), int(label[3]*H)]
x,y,w,h = object
plt.subplot(1,2,1)
plt.imshow(img)
plt.subplot(1,2,2)
plt.imshow(img[y:y+h, x:x+w])
plt.show()
plt.show()
Output:
]1
Hope this helps!

detect.py
Crops will be saved under runs/detect/exp/crops, with a directory for each class detected.
python detect.py --save-crop
https://github.com/ultralytics/yolov5/issues/5412

Comparing and plotting regions of the same color over a dataset of a few hundred images

A chem student asked me for help with plotting image segmenetation:
A stationary camera takes a picture of the experimental setup every second over a period of a few minutes, so like 300 images yield.
The relevant parts in the setup are two adjacent layers of differently-colored foams observed from the side, a 2-color sandwich shrinking from both sides, basically, except one of the foams evaporates a bit faster.
I'd like to segment each of the images in the way that would let me plot both foam regions' "width" against time.
Here is a "diagram" :)
I want to go from here --> To here
Ideally, given a few hundred of such shots, in which only the widths change, I get an array of scalars back that I can plot. (Going to look like a harmonic series on either side of the x-axis)
I have a bit of python and matlab experience, but have never used OpenCV or Image Processing toolbox in matlab, or actually never dealt with any computer vision in general. Could you guys throw like a roadmap of what packages/functions to use or steps one should take and i'll take it from there?
I'm not sure how to address these things:
-selecting at which slice along the length of the slice the algorithm measures the width(i.e. if the foams are a bit uneven), although this can be ignored.
-which library to use to segment regions of the image based on their color, (some k-means shenanigans probably), and selectively store the spatial parameters of the resulting segments?
-how to iterate that above over a number of files.
Thank you kindly in advance!

Assume your Intensity will be different after converting into gray scale ( if not, just convert to other color space like HSV or LAB, then just use one of the components)
img = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
First, Threshold your grayscaled input into a few bands
ret,thresh1 = cv.threshold(img,128,255,cv.THRESH_BINARY)
ret,thresh2 = cv.threshold(img,27,255,cv.THRESH_BINARY_INV)
ret,thresh3 = cv.threshold(img,77,255,cv.THRESH_TRUNC)
ret,thresh4 = cv.threshold(img,97,255,cv.THRESH_TOZERO)
ret,thresh5 = cv.threshold(img,227,255,cv.THRESH_TOZERO_INV)
The value should be tested out by your actual data. Here Im just give a example
Clean up the segmented image using median filter with a radius larger than 9. I do expect some noise. You can also use ROI here to help remove part of noise. But personally I`m lazy, I just wrote program to handle all cases and angle
threshholed_images_aftersmoothing = cv2.medianBlur(threshholed_images,9)
Each band will be corresponding to one color (layer). Now you should have N segmented image from one source. where N is the number of layers you wish to track
Second use opencv function bounding rect to find location and width/height of each Layer AKA each threshholed_images_aftersmoothing. Eg. boundingrect on each sub-segmented images.
C++: Rect boundingRect(InputArray points)
Python: cv2.boundingRect(points) → retval¶
Last, the rect have x,y, height and width property. You can use a simple sorting order to sort from top to bottom layer based on rect attribute x. Run though all vieo to obtain the x(layer id) , height vs time graph.
Rect API
Public Attributes
_Tp **height** // this is what you are looking for
_Tp width
_Tp **x** // this tells you the position of the band
_Tp y
By plot the corresponding heights (|AB| or |CD|) over time, you can obtain the graph you needed.
The more correct way is to use Kalman filter to track the position and height graph as I would expect some sort of bubble will occur and will interfere with the height of the layers.
To be honest, i didnt expect a chem student to be good at this. Haha good luck
Anything wrong you can find me here or Email me if i`m not watching stackoverflow

You can select a region of interest straight down the middle of the foams, a few pixels wide. If you stack these regions for each image it will show the shrink over time.
If for example you use 3 pixel width for the roi, the result of 300 images will be a 900 pixel wide image, where the left is the start of the experiment and the right is the end. The following image can help you understand:
Though I have not fully tested it, this code should work. Note that there must only be images in the folder you reference.
import cv2
import numpy as np
import os
# path to folder that holds the images
path = '.'
# dimensions of roi
x = 0
y = 0
w = 3
h = 100
# store references to all images
all_images = os.listdir(path)
# sort images
all_images.sort()
# create empty result array
result = np.empty([h,0,3],dtype=np.uint8)
for image in all_images:
# load image
img = cv2.imread(path+'/'+image)
# get the region of interest
roi = img[y:y+h,x:x+w]
# add the roi to previous results
result = np.hstack((result,roi))
# optinal: save result as image
# cv2.imwrite('result.png',result)
# display result - can also plot with matplotlib
cv2.imshow('Result', result)
cv2.waitKey(0)
cv2.destroyAllWindows()
Update after question edit:
If the foams have different colors, your can use easily separate them by color by converting the image you hsv and using inrange (example). This creates a mask (=2D array with values from 0-255, one for each pixel) that you can use to calculate average height and extract the parameters and area of the image.
You can find a script to help you find the HSV colors for separation on this GitHub

Topography height prediction from 2D image

I would like to train 2 D images with the corresponding pixel heigh topography information. I have a bunch of 2 D images taken from a topography where the height of each pixel is also known. Is there any way that I can use deep learning to train the images with height pixel information?
I have already tried to infer some features from the images and pixel heights and relate them by regression method such as SVM, but I did not get satisfactory results yet for predicting new image pixel height features.

How about using the pixel height values as labels, and the images (RGB I assume, so 3 channels) as training set. Then you can just run supervised learning. Although I am not sure how you could recover height by just looking at an image, even humans would have trouble doing that even after seeing many images. I think you would need some kind of reference point.
To convert an image into a 3D array of values (3rd dimension are the color channels):
from keras.preprocessing import image
# loads RGB image as PIL.Image.Image type
img = image.load_img(img_file_path, target_size=(120, 120))
# convert PIL.Image.Image type to 3D tensor with shape (120, 120, 3)
x = image.img_to_array(img)
There are a number of other ways too: Convert an image to 2D array in python
In terms of assigning labels to images (here labels are the pixel heights), it would be as simple as creating your training set x_train (nb_images, 120, 120, 3) and labels y_train (nb_images, 120, 120, 1) and running supervised learning on these until for each image in x_train the model can predict each corresponding value in the height set y_train within a certain error.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.