Opencv Cascade classification instead of detection

Opencv Cascade classification instead of detection - python

I have trained my classifiers by using Cascade classification from opencv for object classification.
I have three classes and I have got three *.xml files.
I know one region of the image must be one of the three classes.
However by using opencv, only detectMultiScale function is provided, I must scan the image (or ROI) to find all possible objects in it.
Is there method to classify whether one image(or roi) is matching a specified object or not?
Thank you!

From your question I understand that you want to classify three separate ROIs of an image. You might want to create three crops for the defined ROIs:
import cv2
img = cv2.imread("full_image.png")
crop_img1 = img[y:y+h, x:x+w]
#create crop_img2 and crop_img3 analogously
And apply a classifier on each of the three cropped images.

Related

image segmentation without labeling

Is there any image segmentation model for segmenting images without putting labels for segmented parts (and bounding boxes)? I need to segment my image even for not-trained objects, so I guess I should use a model that does not have specific labels for segmenting.

You seem to be looking for unsupervised image segmentation?
https://github.com/kanezaki/pytorch-unsupervised-segmentation
https://github.com/Mirsadeghi/Awesome-Unsupervised-Segmentation
include some potential solutions.

How to get single bounding box while detecting face using opencv and python

I do not get perfect accuracy while detecting a face using opencv.
Here is my code:
import cv2
#create a cascadeclassifier object
face_cascade = cv2.CascadeClassifier("C:/Users/yash/AppData/Local/Programs/Python/Python35/Lib/site-packages/cv2/data/haarcascade_frontalface_default.xml")
#create a cascade classifier.it will contain the features of the face
#reading the image as it is
img = cv2.imread("profile.JPG")
#reading the image as gray_scale image
gray_img = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY) #converting colored image to gray scale
#search the co-ordinates of the image
faces = face_cascade.detectMultiScale(gray_img,scaleFactor = 1.05,minNeighbors=5)
#scaleFactor = decreases the shape value by 5%,until the face is found .smaller this value , the greater is the accuracy.
#detectMultiScale = method to search for the face rectangle co-ordinates
#print(type(faces))
#print(faces)
for x,y,w,h in faces:
img = cv2.rectangle(img,(x,y),(x+w,y+h),(0,255,0),3)
resized_img = cv2.resize(img,(int(img.shape[1]/2) , int(img.shape[0]/2)))
cv2.imshow("face detection",resized_img)
cv2.waitKey(0)
cv2.destroyAllWindows()
Here there is the image I am trying to get perfect accuracy on.

For one face use flag CV_HAAR_FIND_BIGGEST_OBJECT as last parameter in detectMultiScale.
But Haar cascades now are not the best choice for face detection. In OpenCV 4.0 developers remove code for Haar cascades training - they recommend to use DNN. For example here.
And second: OpenCV developers created an open source framework for DNN inference - OpenVINO and a lot of pretrained models (for face detection too). If you want to have the fastest face detector on CPU than you need to use OpenVINO.

In addition to #Nuzhny's recommendation, you should use Non Maximum Suppression algorithm to solve the problem of multiple detections.
Pyimagesearch has a very good article along with code on this topic which will help you.

Python OCR: ignore signatures in documents

I'm trying to do OCR of a scanned document which has handwritten signatures in it. See the image below.
My question is simple, is there a way to still extract the names of the people using OCR while ignoring the signatures? When I run Tesseract OCR it fails to retrieve the names. I tried grayscaling/blurring/thresholding, using the code below, but without luck. Any suggestions?
image = cv2.imread(file_path)
image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
image = cv2.GaussianBlur(image, (5, 5), 0)
image = cv2.threshold(image, 0, 255, cv2.THRESH_BINARY_INV | cv2.THRESH_OTSU)[1]

You can use scikit-image's Gaussian filter to blur thin lines first (with an appropriate sigma), followed by binarization of image (e.g., with some thresholding function), then by morphological operations (such as remove_small_objects or opening with some appropriate structure), to remove the signatures mostly and then try classification of the digits with sliding window (assuming that one is already trained with some blurred characters as in the test image). The following shows an example.
from skimage.morphology import binary_opening, square
from skimage.filters import threshold_minimum
from skimage.io import imread
from skimage.color import rgb2gray
from skimage.filters import gaussian
im = gaussian(rgb2gray(imread('lettersig.jpg')), sigma=2)
thresh = threshold_minimum(im)
im = im > thresh
im = im.astype(np.bool)
plt.figure(figsize=(20,20))
im1 = binary_opening(im, square(3))
plt.imshow(im1)
plt.axis('off')
plt.show()
[EDIT]: Use Deep Learning Models
Another option is to pose the problem as an object detection problem where the alphabets are objects. We can use deep learning: CNN/RNN/Fast RNN models (with tensorflow/keras) for object detection or Yolo model (refer to the this article for car detection with yolo model).

I suppose the input pictures are grayscale, otherwise maybe the different color of the ink could have a distinctive power.
The problem here is that, your training set - I guess - contains almost only 'normal' letters, without the disturbance of the signature - so naturally the classifier won't work on letters with the ink of signature on them. One way to go could be to extend the training set with letters of this type. Of course it is quite a job to extract and label these letters one-by-one.
You can use real letters with different signatures on them, but it might be also possible to artificially generate similar letters. You just need different letters with different snippets of signatures moved above them. This process might be automated.

You may try to preprocess the image with morphologic operations.
You can try opening to remove the thin lines of the signature. The problem is that it may remove the punctuation as well.
image = cv2.imread(file_path)
image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
kernel = cv2.getStructuringElement(cv2.MORPH_CROSS,(5,5))
image = cv2.morphologyEx(image, cv2.MORPH_OPEN, kernel)
You may have to alter the kernel size or shape. Just try different sets.

You can try other OCR providers for the same task. For example, https://cloud.google.com/vision/ try this. You can upload an image and check for free.
You will get a response from API from where you can extract the text which you need. Documentation for extracting that text is also given on the same webpage.
Check out this. this will help you in fetching that text. this is my own answer when I faced the same problem. Convert Google Vision API response to JSON

How can I extract hand features from these images?

I have two different types of images (which I cannot post due to reputation, so I've linked them.):
Image 1 Image 2
I was trying to extract hand features from the images using OpenCV and Python. Which kinda looks like this:
import cv2
image = cv2.imread('image.jpg')
blur = cv2.GaussianBlur(image, (5,5), 0)
gray = cv2.cvtColor(blur, cv2.COLOR_BGR2GRAY)
retval, thresh1 = cv2.threshold(gray, 70, 255, / cv2.THRESH_BINARY_INV+cv2.THRESH_OTSU)
cv2.imshow('image', thresh1)
cv2.waitKey(0)
The result of which looks like this:
Image 1 Image 2
The change in background in the second image is messing with the cv2.threshold() function and its not getting the skin parts right. Is there a way to do this right?
As a follow up question, what is the best way to extract hand features? I tried a HaaR Cascade and I didn't really get results? Should I train my own cascade? What other options do I have?

It's hard to say based on a sample size of two images, but I would try OpenCV's Integral Channel Features (ChnFtrs), which are like supercharged Haar features that can take cues from colour as well as any other image channels you care to create and provide.
In any case, you are going to have to train your own cascades. Separate cascades for front and profile shots of course.
Take out your thresholding by skin colour, because as you've already noticed, it may throw away some or all of the hands depending on the actual subject's skin colour and lighting. ChnFtrs will do the skin detection for you more robustly than a fixed threshold can. (Though for future reference, all humans are actually orange :))
You could eliminate some false positives by only detecting within a bounding box of where you expect the hands to be.
Try both RGB and YUV channels to see what works best. You could also throw in the results of edge detection (say, Canny, maximised across your 3 colour channels) for good measure. At the end, you could cull channels which are underused to save processing if necessary.
If you have much variation in hand pose, you may need to group similar poses and train a separate ChnFtrs cascade for each group. Individual cascades do not have a branching structure, so they do not cope well when the positive samples are disjoint in parameter space. This is, AFAIK, a bit of an unexplored area.
A correctly trained ChnFtrs cascade (or several) may give you a bounding box for the hands, which will help in extracting hand contours, but it can't exclude invalid contours within the same bounding box. Most other object detection routines will also have this problem.
Another option, which may be better/simpler than ChnFtrs, is LINEMOD (a current favourite of mine). It has the advantage that there's no complex training process, nor any training time needed.

Matching two different images of the same object

I am looking at an application where I have two images from a single object (a smartphone) acquired using x-ray. The two images have been acquired at different instant. The intensity content of the two images is thus different and I would like to be able to fuse the two images in order to extract some information about the phones.
In between the two images, the set up have been slightly changed such that the phone is not placed at the same pixel value in the two images. To be able to correctly compare the two images, I would need to translate and rotate the images of the phones such that they overlap as much as possible.
For this I am using python and open cv (cv2). I was thinking to use a thresholding and then find the coordinate of the two thresholded images and use the coordinate to map the yellow image on the red one (or the opposite). The attached image shows what I have obtained so far.
The pseudo code is given in the following:
ret1, thresh1 = cv2.threshold(img1.astype(np.uint8),200,255,cv2.THRESH_BINARY_INV)
ret2, thresh2 = cv2.threshold(template.astype(np.uint8),200,255,cv2.THRESH_BINARY_INV)
plt.figure(1)
plt.subplot(121)
plt.imshow(thresh1)
plt.subplot(122)
plt.imshow(thresh2)
plt.show()
where img1 is one image acquired with the first filter and template is the image acquired with the second filter. One can see that the phones are at different position in the yellow and green images respectivelly.
My question is how to perform the next step. How can I find the coordinate of this thresholded images and then superpose the images of the two phones? Is it the right strategy at all or are there better solutions?
I have been looking at this link on template matching, but for the moment I have had no success.

Hei,
Image-registration did the tricks, thanks! I followed the following tutorial:
image-registration and manage to do what I was looking for.
Thanks!
Greg

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.