How can I extract hand features from these images?

How can I extract hand features from these images? - python

I have two different types of images (which I cannot post due to reputation, so I've linked them.):
Image 1 Image 2
I was trying to extract hand features from the images using OpenCV and Python. Which kinda looks like this:
import cv2
image = cv2.imread('image.jpg')
blur = cv2.GaussianBlur(image, (5,5), 0)
gray = cv2.cvtColor(blur, cv2.COLOR_BGR2GRAY)
retval, thresh1 = cv2.threshold(gray, 70, 255, / cv2.THRESH_BINARY_INV+cv2.THRESH_OTSU)
cv2.imshow('image', thresh1)
cv2.waitKey(0)
The result of which looks like this:
Image 1 Image 2
The change in background in the second image is messing with the cv2.threshold() function and its not getting the skin parts right. Is there a way to do this right?
As a follow up question, what is the best way to extract hand features? I tried a HaaR Cascade and I didn't really get results? Should I train my own cascade? What other options do I have?

It's hard to say based on a sample size of two images, but I would try OpenCV's Integral Channel Features (ChnFtrs), which are like supercharged Haar features that can take cues from colour as well as any other image channels you care to create and provide.
In any case, you are going to have to train your own cascades. Separate cascades for front and profile shots of course.
Take out your thresholding by skin colour, because as you've already noticed, it may throw away some or all of the hands depending on the actual subject's skin colour and lighting. ChnFtrs will do the skin detection for you more robustly than a fixed threshold can. (Though for future reference, all humans are actually orange :))
You could eliminate some false positives by only detecting within a bounding box of where you expect the hands to be.
Try both RGB and YUV channels to see what works best. You could also throw in the results of edge detection (say, Canny, maximised across your 3 colour channels) for good measure. At the end, you could cull channels which are underused to save processing if necessary.
If you have much variation in hand pose, you may need to group similar poses and train a separate ChnFtrs cascade for each group. Individual cascades do not have a branching structure, so they do not cope well when the positive samples are disjoint in parameter space. This is, AFAIK, a bit of an unexplored area.
A correctly trained ChnFtrs cascade (or several) may give you a bounding box for the hands, which will help in extracting hand contours, but it can't exclude invalid contours within the same bounding box. Most other object detection routines will also have this problem.
Another option, which may be better/simpler than ChnFtrs, is LINEMOD (a current favourite of mine). It has the advantage that there's no complex training process, nor any training time needed.

Related

Remove image background so that only the logo (usually some text) remains as png

I would like to extract logos from golf balls for further image processing.
I have already tried different methods.
I wanted to use the grayscale value of the images to locate their location and then cut it out. Due to many different logos and a black border around the images, this method unfortunately failed.
as my next approach I thought that I first remove the black background and then repeat the procedure from 1. but also without success because there is a dark shadow in the lower left corner and this is also recognized as the "logo" with the grayscale method. Covering the border further on the outside is not a solution, because otherwise logos that are on the border will also be cut away or only half of them will be detected.
I used the edge detection algorithm Canny of the Open CV library. The detection looked very promising, but I was not able to extract only the logo from the detection, because the edge of the Golfball was also recognized.
Any solution is welcome. Please forgive my English. Also, I am quite a beginner in programming. Probably there is a very simple solution to my problem but I thank you in advance for your help.
Here are 2 example images first the type of images from which the logos should be extracted and then how the image should look like after extraction.
Thank you very much. Best regards T

This is essentially "adaptive" thresholding, except this approach doesn't need to threshold. It adapts to the illumination, leaving you with a perfectly fine grayscale image (or color, if extended to do that).
median blur (large kernel size) to estimate ball/illumination
division to normalize
illumination:
normalized (and scaled a bit):
thresholded with Otsu:
def process(im, r=80):
med = cv.medianBlur(im, 2*r+1)
with np.errstate(divide='ignore', invalid='ignore'):
normalized = np.where(med <= 1, 1, im.astype(np.float32) / med.astype(np.float32))
return (normalized, med)
normalized, med = process(ball1, 80)
# imshow(med)
# imshow(normalized * 0.8)
ret, thresh = cv.threshold((normalized.clip(0,1) * 255).astype('u1'), 0, 255, cv.THRESH_BINARY + cv.THRESH_OTSU)
# imshow(thresh)

Adaptive thresholding can do the trick.

How to detect a flash / Glare in an image of document using skimage / opencv in python?

Please suggest a new approach or at least a method to make any of these robust enough to detect at good rate
I have some images (mostly taken from computer screen) where some kind of flash from camera or so called is present there. I want to discard these type of images or at least notify the user to retake it. How could I do that?
I do not have enough to train a Deep Learning Classification model such as Fast Glare Detection
Here is the Data of more than 70 images
I have tried these few things in order:
Bright area detection using OpenCV cv2.minMaxLoc function but it ALWAYS returns the area no matter what and mostly it fails for my type of images.
I found this code for removal but it is in Matlab
This code uses Clahe adjustement but the problem is that it removes rather than detection
This one looks promising but is not robust enough for my image type
The final below code I found is somewhat I need but can someone help me for making it robust. for example using these thresholds / changing them / or using Binarization , Closing (Increasing White area with dilation and then removing black Noise with Erosion) such that these are generalised for all.
def get_image_stats(img_path):
img = cv2.imread(img_path)
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
blurred = cv2.GaussianBlur(gray, (25, 25), 0)
no_text = gray * ((gray/blurred)>0.99) # select background only
no_text[no_text<10] = no_text[no_text>20].mean() # convert black pixels to mean value
no_bright = no_text.copy()
no_bright[no_bright>220] = no_bright[no_bright<220].mean() # disregard bright pixels
std = no_bright.std()
bright = (no_text>220).sum()
if no_text.mean()<200 and bright>8000:
return True
These are some of few examples:

How to improve OCR results with Google Vision API and Python?

I am working with Google Vision API and Python to apply text_detection which is an OCR function of Google Vision API which detects the text on the image and returns it as an output. My original image is the following:
I have used the following different algorithms:
1) Apply text_detection to the original image
2) Enlarge the original image by 3 times and then apply text_detection
3) Apply Canny, findContours, drawContours on a mask (with OpenCV) and then text_detection to this
4) Enlarge the original image by 3 times, apply Canny, findContours, drawContours on a mask (with OpenCV) and then text_detection to this
5) Sharpen the original image and then apply text_detection
6) Enlarge the original image by 3 times, sharpen the image and then apply text_detection
The ones which fare the best are (2) and (5). On the other hand, (3) and (4) are probably the worse among them.
The major problem is that text_detection does not detect in most cases the minus sign especially the one of '-1.00'.
Also, I do not know why, sometimes it does not detect '-1.00' itself at all which is quite surprising as it does not have any significant problem with the other numbers.
What do you suggest me to do to detect accurately the minus sign and in general the numbers?
(Keep in mind that I want to apply this algorithm to different boxes so the numbers may not be at the same position as in this image)

I dealt with the same problem. Your end goal is to correctly identify the text. For OCR conversion you are using a third party service or tool (google API / tesseract etc.)
All the the approach that you are talking about become meaningless because whatever transformations that you are doing using openCV will be repeated by tesseract.The best you should do is supply the input in a easy format.
What did work for me the best is breaking the image is parts (BOXES - "SQUARES AND RECTANGLES" - using a sample code for identifying the rectangles in all channels in openCV repo examples using https://github.com/opencv/opencv/blob/master/samples/python/squares.py) and then crop it and then send it for OCR by parts.

Since you are using Google Vision API which detects the text on the image, so it is not obvious for a text detection API to detect negative numbers in first place. Assuming that fact that you may not able to re-train the API as per your case, I would recommend you to write a simple script which filters the contours on the basis of it's shape and size, using this script you can easily segment out the negative signs and then merge it with the output from Google Vision API as
import cv2
import numpy as np
img = cv2.imread("path/to/img.jpg", 0)
ret, thresh = cv2.threshold(img, 200, 255, cv2.THRESH_BINARY)
i, contours, hierarchy = cv2.findContours(thresh.copy(), cv2.RETR_TREE, cv2.CHAIN_APPROX_NONE)
# filter the contours.
for cnt in contours:
x, y, w, h = cv2.boundingRect(cnt)
if 5 < cv2.contourArea(cnt) < 50 and float(w)/h > 3:
print "I have detected a minus sign at : ", x, y, w, h
After this filtering process you can make a calculated guess if a given digit has a negative sign close it it's left side.

If Google Vision API uses Tesseract, which I think it does,,
then optimization is usually as follows:
Sharpen
Binarize (or grayscale if you must)
Trim borders (Tesseract likes smooth background)
Deskew (Tesseract tolerates very small skew angle. It likes nice straight text lines)
Reshape and resize (Put it in a page-like shape and resize if necessary)
As for negative signs, well, use Tesseract directly, if you can.
You will be able to retrain it or to download better trainings.
Or well, you can correct the errors using additional algorithm. I.e. implement your recheck as suggested in ZdaR's answer.

how can i get a black and white image for the following picture?

I want to convert the picture into black and white image accurately where the seeds will be represented by white color and the background as black color. I would like to have it in python opencv code. Please help me out
I got good result for the above picture using the given code below. Now I have another picture for which thresholding doesn't seem to work. How can I tackle this problem. The output i got is in the following picture
also, there are some dents in the seeds, which the program takes it as the boundary of the seed which is not a good results like in the picture below. How can i make the program ignore dents. Is masking the seeds a good option in this case.

I converted the image from BGR color space to HSV color space.
Then I extracted the hue channel:
Then I performed threshold on it:
Note:
Whenever you face difficulty in certain areas try working in a different color space, the HSV color space being most prominent.
UPDATE:
Here is the code:
import cv2
import numpy as np
filename = 'seed.jpg'
img = cv2.imread(filename) #---Reading image file---
hsv_img = cv2.cvtColor(img,cv2.COLOR_BGR2HSV) #---Converting RGB image to HSV
hue, saturation, value, = cv2.split(hsv_img) #---Splitting HSV image to 3 channels---
blur = cv2.GaussianBlur(hue,(3,3),0) #---Blur to smooth the edges---
ret,th = cv2.threshold(blur, 38, 255, 0) #---Binary threshold---
cv2.imshow('th.jpg',th)
Now you can perform contour operations to highlight your regions of interest also. Try it out!! :)
ANOTHER UPDATE:
I found the contours higher than a certain constraint to get this:

There are countless ways for image segmentation.
The simplest one is a global threshold operation. If you want to know more on other methods you should read some books. Which I recommend anyway befor you do any further image processing. It doesn't make much sense to start image processing if you don't know the most basic tools.
Just to show you how this could be achieved:
I converted the image from RGB to HSB. I then applied separate global thresholds to the hue and brightness channels to get the best segmentation result for both images.
Both binary images were then combined using a pixelwise AND operation. I did this because both channels gave sub-optimal results, but their overlap was pretty good.
I also applied some morphological operators to clean up the results.
Of course you can just invert the image to get the desired black background...
Thresholds and the used channels of course depend on the image you have and what you want to achieve. This is a very case-specific process that can be dynamically adapted to a limited extend.
This could be followed by labling or whatever you need:

Blur part of an Image and blend it with the Background

I need to blur faces to protect the privacy of people in street view images like Google does in Google Street View. The blur should not make the image aesthetically unpleasant. I read in the paper titled Large-scale Privacy Protection in Google Street View by Google (link) that Google does the following to blur the detected faces.
We chose to apply a combination of noise and aggressive Gaussian blur that we alpha-blend smoothly with the background starting at the edge of the box.
Can someone explain how to perform this task? I understand Gaussian Blur, but how to blend it with the background?
Code will be helpful but not required
My question is not how to blur a part of image?, it is how to blend the blurred portion with the background so that blur is not unpleasant? Please refer to the quote I provided from the paper.
I have large images and a lot of them. An iterative process as in the possible duplicate will be time consuming.
EDIT
If someone ever wants to do something like this, I wrote a Python implementation. It isn't exactly what I was asking for but it does the job.
Link: pyBlur

I'm reasonably sure the general idea is:
Create a shape for the area you want to blur (say a rectangle).
Extend your shape by X pixels outwards.
Apply a gradient on alpha from 0.0 .. 1.0 (or similar) over the extended area.
Apply blur the extended area (ignoring alpha)
Now use an alpha-blend to apply the modified image to the original image.
Adding noise in a similar way to the original image would make it further less obvious that it's been blurred (because the blur will of course also blur away the noise).
I don't know the exact parameters for how much to grow, what values to use for the alpha gradient, etc, but that's what I understand from the quoted text.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.