How to crop out the annotation box and everything within it? - python

After running yolov8, the algorithm annotated the following picture: Density-Area
My goal is to crop out a large number of these pictures to use in the further analysis. So, I want everything within the bounding box saved, and everything else outside of it removed.
I tried using torch, numpy, cv2, and PIL but haven't been successful.
import torch
import torchvision
from PIL import Image
# Load the image
image = Image.open("path to .jpg")
# Define the model and download the pre-trained weights
model = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=True, weights=None)
# Set the model to evaluation mode
model.eval()
# Transform the image to a tensor
transform = torchvision.transforms.ToTensor()
image_tensor = transform(image)
# Make predictions on the image using the model
predictions = model([image_tensor])
# Extract the bounding boxes and object labels from the predictions
boxes = predictions[0]['boxes'].tolist()
labels = predictions[0]['labels'].tolist()
# Crop the image for each object detected
for i in range(len(boxes)):
bbox = tuple(boxes[i])
object_label = labels[i]
object_image = image.crop(bbox)
object_image.save(f"image_save.jpg")

The image is just an nd-array, so just use array indexing to perform the cropping operation you desire.
For example I assume your bounding boxes are of the form [xmin,ymin,xmax,ymax].
for i in range(len(boxes)):
object_label = labels[i]
object_image = image_tensor
crop = object_image[:,ymin:ymax,xmin:xmax]
# permute color dimension last
crop = crop.permute(1,2,0)
# convert from tensor to numpy array
crop = crop.data.numpy()
# swap from RGB to BGR (per opencv convention)
crop = crop[:,:,::-1]
# save
cv2.imwrite("output_image.jpg",crop)
I'm sure you could accomplish this working directly with the PIL image objects as well but more generally in response to your comment: NO, you cannot crop an image without providing the coordinates of the cropping bounding box.

Related

Scaling pixels between -1 and 1 using cv2.Normalize()

def preprocess(self):
# Import image
pic1 = self.path
raw_image = cv2.imread(pic1)
#cv2.imshow('Raw image',raw_image)
#cv2.waitKey(0)
# Resize image
dim = (320,180)
resized = cv2.resize(raw_image, dim)
#cv2.imshow('Resized Image',resized)
#cv2.waitKey(0)
# Scale image
scaled = cv2.normalize(resized, None, alpha=-1, beta=1, norm_type=cv2.NORM_MINMAX,dtype=cv2.CV_32F)
#cv2.imshow('Scaled Image',scaled)
#cv2.waitKey(0)
return scaled
I'm trying to scale the pixel values of "raw_image" to within the range -1 to 1 as part of a pre-process for identifying an object using machine learning. Essentially, a camera takes a picture, resizes and scales the image to the same size as the images within a dataset used for training and validating. Then that image is inferred by the model generated using model.fit() to detect what the object in the image actually is.
The question here is: " Is this scaling function correct for putting the pixel values in the range of -1 to 1?" It appears SUPER dark when I use cv2.imshow and I'm afraid the model isn't recognizing it properly.

Python OpenCV cannot display image correctly after transformation

Today I was trying to compress the image below, using sklearn's PCA algorithm in Python.
Because the image is RGB (3 channels), I first reshaped the image, so that it becomes 2D. Then, I applied the PCA algorithm on the data to compress the image. After the image was compressed, I inversed the PCA transformation and reshaped the approximated (decompressed) image back to its original shape.
However, when I tried to display the approximated image I got this weird result here:
While the image is stored correctly with the cv2.imwrite function, OpenCV fails to display the image correctly using cv2.imshow. Do You have any idea why this might be happening?
My code is below:
from sklearn.decomposition import PCA
import cv2
import numpy as np
image_filepath = 'baby_yoda_image.jpg'
# Loading image from disk.
input_image = cv2.imread(image_filepath)
height = input_image.shape[0]
width = input_image.shape[1]
channels = input_image.shape[2]
# Reshaping image to perform PCA.
print('Input image shape:', input_image.shape)
#--- OUT: (533, 800, 3)
reshaped_image = np.reshape(input_image, (height, width*channels))
print('Reshaped Image:', reshaped_image.shape)
#--- OUT: (533, 2400)
# Applying PCA transformation to image. No whitening is applied to prevent further data loss.
n_components = 64
whitening = False
pca = PCA(n_components, whitening)
compressed_image = pca.fit_transform(reshaped_image)
print('PCA Compressed Image Shape:', compressed_image.shape)
#--- OUT: (533, 64)
print('Compression achieved:', np.around(np.sum(pca.explained_variance_ratio_), 2)*100, '%')
#--- OUT: 97.0 %
# Plotting images.
approximated_image = pca.inverse_transform(compressed_image)
approximated_original_shape_image = np.reshape(approximated_image, (height, width, channels))
cv2.imshow('Input Image', input_image)
cv2.imshow('Compressed Image', approximated_original_shape_image)
cv2.waitKey()
Thanks in advance.
Finally, I found a solution to this problem, thanks to #fmw42 . After the transformation, there were negative values in the pixels and also values that exceeded 255.
Luckily, OpenCV does take care of this problem with this line of code:
approximated_uint8_image = cv2.convertScaleAbs(approximated_original_shape_image)

Resizing non uniform images with precise face location

I work at a studio that does school photos and we are trying to make a script to eliminate the job of cropping each photo to a template. The photos we work with are fairly uniform but they vary in resolution and head position a bit. I took up the mantle of trying to write the script with my fairly limited Python knowledge and through a lot of trial and error and online resources I think I have got most of the way there.
At the moment I am trying to figure out the best way to have the image crop from the NumPy array with the head where I want and I just cant find a good flexible solution. The head needs to be positioned slightly differently for pose 1 and pose 2 so its needs to be easy to change on the fly (Probably going to implement some sort of simple GUI to input stuff like that, but for now I can just change the code).
I also need to be able to change the output resolution of the photo so they are all uniform (2000x2500). Anyone have any ideas?
At the moment this is my current code, it just saves the detected face square:
import cv2
import os.path
import glob
# Cascade path
cascPath = 'haarcascade_frontalface_default.xml'
# Create the haar cascade
faceCascade = cv2.CascadeClassifier(cascPath)
#Check for output folder and create if its not there
if not os.path.exists('output'):
os.makedirs('output')
# Read Images
images = glob.glob('*.jpg')
for c, i in enumerate(images):
image = cv2.imread(i, 1)
# Convert to grayscale
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
# Find face(s) using cascade
faces = faceCascade.detectMultiScale(
gray,
scaleFactor=1.1, # size of groups
minNeighbors=5, # How many groups around are detected as face for it to be valid
minSize=(500, 500) # Min size in pixels for face
)
# Outputs number of faces found in image
print('Found {0} faces!'.format(len(faces)))
# Places a rectangle on face
for (x, y, w, h) in faces:
imgCrop = image[y:y+h,x:x+w]
if len(faces) > 0:
#Saves Images to output folder with OG name
cv2.imwrite('output/'+ i, imgCrop)
I can crop using it like this:
# Crop Padding
left = 300
right = 300
top = 400
bottom = 1000
for (x, y, w, h) in faces:
imgCrop = image[y-top:y+h+bottom, x-left:x+w+right]
but that outputs pretty random resolutions and changes based on the image resolution
TL;DR
To set a new resolution with the dimension, you can use cv2.resize. There may be a pixel loss so you can use the interpolation method.
The newly resized image may be in BGR format, so you may need to convert to RGB format.
cv2.resize(src=crop, dsize=(2000, 2500), interpolation=cv2.INTER_LANCZOS4)
crop = cv2.cvtColor(crop, cv2.COLOR_BGR2RGB) # Make sure the cropped image is in RGB format
cv2.imwrite("image-1.png", crop)
Suggestion:
One approach is using python's face-recognition library.
The approach is using two sample images for training.
Predict the next image based on training images.
For instance, The followings are the training images:
We want to predict the faces in the below image:
When we get the facial encodings of the training images and apply to the next image:
import face_recognition
import numpy as np
import matplotlib.pyplot as plt
from PIL import Image, ImageDraw
# Load a sample picture and learn how to recognize it.
first_image = face_recognition.load_image_file("images/ex.jpg")
first_face_encoding = face_recognition.face_encodings(first_image)[0]
# Load a second sample picture and learn how to recognize it.
second_image = face_recognition.load_image_file("images/index.jpg")
sec_face_encoding = face_recognition.face_encodings(second_image)[0]
# Create arrays of known face encodings and their names
known_face_encodings = [
first_face_encoding,
sec_face_encoding
]
print('Learned encoding for', len(known_face_encodings), 'images.')
# Load an image with an unknown face
unknown_image = face_recognition.load_image_file("images/babes.jpg")
# Find all the faces and face encodings in the unknown image
face_locations = face_recognition.face_locations(unknown_image)
face_encodings = face_recognition.face_encodings(unknown_image, face_locations)
# Convert the image to a PIL-format image so that we can draw on top of it with the Pillow library
# See http://pillow.readthedocs.io/ for more about PIL/Pillow
pil_image = Image.fromarray(unknown_image)
# Create a Pillow ImageDraw Draw instance to draw with
draw = ImageDraw.Draw(pil_image)
# Loop through each face found in the unknown image
for (top, right, bottom, left), face_encoding in zip(face_locations, face_encodings):
matches = face_recognition.compare_faces(known_face_encodings, face_encoding)
face_distances = face_recognition.face_distance(known_face_encodings, face_encoding)
best_match_index = np.argmin(face_distances)
draw.rectangle(((left, top), (right, bottom)), outline=(0, 0, 255), width=5)
# Remove the drawing library from memory as per the Pillow docs
del draw
# Display the resulting image
plt.imshow(pil_image)
plt.show()
The output will be:
The above is my suggestion. When you create a new resolution with the current image, there will be a pixel loss. Therefore you need to use an interpolation method.
For instance: after finding the face locations, select the coordinates in the original image.
# Add after draw.rectangle function.
crop = unknown_image[top:bottom, left:right]
Set new resolution with the size 2000 x 2500 and interpolation with CV2.INTERN_LANCZOS4.
Possible Question: Why CV2.INTERN_LANCZOS4?
Of course, you can select whatever you like, but in this post CV2.INTERN_LANCZOS4 was suggested.
cv2.resize(src=crop, dsize=(2000, 2500), interpolation=cv2.INTER_LANCZOS4)
Save the image
crop = cv2.cvtColor(crop, cv2.COLOR_BGR2RGB) # Make sure the cropped image is in RGB format
cv2.imwrite("image-1.png", crop)
Outputs are around 4.3 MB Therefore I can't display in here.
From the final result, we clearly see and identify faces. The library precisely finds the faces in the image.
Here what you can do:
Either you can use the training images of your own-set, or you can use the example above.
Apply the face-recognition function for each image, using the trained face-locations and save the results in the directory.
here is how I got it to crop how I wanted, this is added right below the "output number of faces" function
#Get the face postion and output values into variables, might not be needed but I did it
for (x, y, w, h) in faces:
xdis = x
ydis = y
w = w
h = h
#Get scale value by dividing wanted head hight by detected head hight
ws = 600/w
hs = 600/h
#scale image to get head to right size, uses bilinear interpolation by default
scale = cv2.resize(image,(0,0),fx=hs,fy=ws)
#calculate head postion for given values
sxdis = int(xdis*ws) #applying scale to x distance and turning it into a integer
sydis = int(ydis*hs) #applying scale to y distance and turning it into a integer
sycent = sydis+300 #adding half head hight to get center
ystart = sycent-700 #subtract where you want the head center to be in pixels, this is for the vertical
yend = ystart+2500 #Add whatever you want vertical resolution to be
xcent = sxdis+300 #adding half head hight to get center
xstart = xcent-1000 #subtract where you want the head center to be in pixels, this is for the horizontal
xend = xstart+2000 #add whatever you want the horizontal resolution to be
#Crop the image
cropped = scale[ystart:yend, xstart:xend]
Its a mess but it works exactly how I wanted it to work.
ended up going with openCV instead of switching to python-Recognition because of speed but I might switch over if I can get multithreading to work in python-recognition.

How to convert data array to show easy to understand results

I tried to make a algorithm using Teachable Machine to receive a picture and see if it fall under one of two categories of pictures (e.g dogs or humans), but after I exported the code that was given I couldn't make sense of how I could make the results that were given via array to turn into something that anyone can understand. So far it only shows a list of two numbers (e.g [[0.00058185 0.99941814]] the first number being dogs and the second one humans) I wanted to make it to show which one of the two numbers means dog and human and the percentage of both or to make it to only shows which one is the most probable to be.
Here's the code:
import tensorflow.keras
from PIL import Image, ImageOps
import numpy as np
from decimal import Decimal
# Disable scientific notation for clarity
np.set_printoptions(suppress=True)
# Load the model
model = tensorflow.keras.models.load_model('keras_model.h5')
# Create the array of the right shape to feed into the keras model
# The 'length' or number of images you can put into the array is
# determined by the first position in the shape tuple, in this case 1.
data = np.ndarray(shape=(1, 224, 224, 3), dtype=np.float32)
# Replace this with the path to your image
image = Image.open('test_photo.jpg')
#resize the image to a 224x224 with the same strategy as in TM2:
#resizing the image to be at least 224x224 and then cropping from the center
size = (224, 224)
image = ImageOps.fit(image, size, Image.ANTIALIAS)
#turn the image into a numpy array
image_array = np.asarray(image)
# display the resized image
image.show()
# Normalize the image
normalized_image_array = (image_array.astype(np.float32) / 127.0) - 1
# Load the image into the array
data[0] = normalized_image_array
# run the inference
prediction = model.predict(data)
print(prediction)
input('Press ENTER to exit')
Using argmax and max does what you want:
"Prediction is {} with {}% probability".format(["dog", "human"][np.argmax(prediction)], round(np.max(prediction)*100,2))
'Prediction is human with 99.94% probability'

Show image from fetched data using openCV

I've been using datasets from sklearn. And I want to show image from 'MNIST original' using openCV.imshow
Here is part of my code
dataset = datasets.fetch_mldata('MNIST original')
features = np.array(dataset.data, 'int16')
labels = np.array(dataset.target, 'int')
list_hog_fd = []
deskewed_images = []
for img in features:
cv2.imshow("digit", img)
deskewed_images.append(deskew(img))
"digit" window appears but it is definitely not an digit image. How can I access real image from dataset?
Shape
MNIST image datasets generally are distributed and used as a 1D vector of 784 values.
However, in order to show it as image, you need to convert it to a 2D matrix with 28*28 values.
Simply using img = img.reshape(28,28) might work in your case.

Categories

Resources