PIL produce grey pixels in OpenCV image - python

I have a strange output in my images: all the characters are bounded with grey pixels around. I am sure at 90% that is because a OpenCV-PIL conversion issue but I don't know how to solve it.
Here is the source image:
And the output (you need to zoom to see the grey pixels..)
A detail here..
This is the code I am using:
import cv2
import tesserocr as tr
from PIL import Image
import os
src = (os.path.expanduser('~\\Desktop\\output4\\'))
causali = os.listdir(src) # CREO LISTA CAUSALI
causali.sort(key=lambda x: int(x.split('.')[0]))
for file in enumerate(causali): # CONTA NUMERO DI FILE CAUSALE
cv_img = cv2.imread(os.path.expanduser('~\\Desktop\\output4\\{}'.format(file[1])), cv2.IMREAD_UNCHANGED)
# since tesserocr accepts PIL images, converting opencv image to pil
pil_img = Image.fromarray(cv2.cvtColor(cv_img, cv2.COLOR_BGR2RGB))
# initialize api
api = tr.PyTessBaseAPI()
try:
# set pil image for ocr
api.SetImage(pil_img)
# Google tesseract-ocr has a page segmentation method(psm) option for specifying ocr types
# psm values can be: block of text, single text line, single word, single character etc.
# api.GetComponentImages method exposes this functionality
# function returns:
# image (:class:`PIL.Image`): Image object.
# bounding box (dict): dict with x, y, w, h keys.
# block id (int): textline block id (if blockids is ``True``). ``None`` otherwise.
# paragraph id (int): textline paragraph id within its block (if paraids is True).
# ``None`` otherwise.
boxes = api.GetComponentImages(tr.RIL.BLOCK, True)
# get text
text = api.GetUTF8Text()
# iterate over returned list, draw rectangles
for (im, box, _, _) in boxes:
x, y, w, h = box['x'], box['y'], box['w'], box['h']
cv_rect = cv2.rectangle(cv_img, (x-10, y-10), (x + w+10, y + h+10), color=(255, 255, 255), thickness=1)
im.save(os.path.expanduser('~\\Desktop\\output5\\{}.png').format(file[0]))
finally:
api.End()
Is there a way to make accept to api.SetImage() a opencv variable ?
Thanks
EDIT: Is there a way to delete all grey pixels by giving their color ?

You need to use a binary thresholding algorithm to filter out the "noise" in your image.
C++ docs
Python docs

So, this is my solution. Found a way to use OpenCV instead of PIL as long as the first one don't convert the image to JPEG during the process.
We will have a clean image from input to output.
Here is the code:
import cv2
import tesserocr as tr
from PIL import Image
import os
cv_img = cv2.imread('C:\\Users\\Link\\Desktop\\0.png', cv2.IMREAD_UNCHANGED)
idx = 0
# since tesserocr accepts PIL images, converting opencv image to pil
pil_img = Image.fromarray(cv_img)
# initialize api
api = tr.PyTessBaseAPI()
try:
# set pil image for ocr
api.SetImage(pil_img)
# Google tesseract-ocr has a page segmentation method(psm) option for specifying ocr types
# psm values can be: block of text, single text line, single word, single character etc.
# api.GetComponentImages method exposes this functionality
# function returns:
# image (:class:`PIL.Image`): Image object.
# bounding box (dict): dict with x, y, w, h keys.
# block id (int): textline block id (if blockids is ``True``). ``None`` otherwise.
# paragraph id (int): textline paragraph id within its block (if paraids is True).
# ``None`` otherwise.
boxes = api.GetComponentImages(tr.RIL.TEXTLINE, True)
# get text
text = api.GetUTF8Text()
# iterate over returned list, draw rectangles
for (im, box, _, _) in boxes:
x, y, w, h = box['x'], box['y'], box['w'], box['h']
cv_rect = cv2.rectangle(cv_img, (x-10, y-10), (x + w+10, y + h+10), color=(255, 255, 255), thickness=1)
roi = cv_rect[y:y + h, x:x + w]
cv2.imwrite(os.path.expanduser('~\\Desktop\\output5\\image.png'), roi)
finally:
api.End()

Related

Bounding box detection for characters / digits

I have images, which look like the following:
I want to find the bounding boxes for the 8 digits. My first try was to use cv2 with the following code:
import cv2
import matplotlib.pyplot as plt
import cvlib as cv
from cvlib.object_detection import draw_bbox
im = cv2.imread('31197402.png')
bbox, label, conf = cv.detect_common_objects(im)
output_image = draw_bbox(im, bbox, label, conf)
plt.imshow(output_image)
plt.show()
Unfortunately that doesn't work. Does anyone have an idea?
The problem in your solution is likely the input image, which is very poor in quality. There’s hardly any contrast between the characters and the background. The blob detection algorithm from cvlib is probably failing to distinguish between character blobs and background, producing a useless binary mask. Let’s try to solve this using purely OpenCV.
I propose the following steps:
Apply adaptive threshold to get a reasonably good binary mask.
Clean the binary mask from blob noise using an area filter.
Improve the quality of the binary image using morphology.
Get the outer contours of each character and fit a bounding rectangle to each character blob.
Crop each character using the previously calculated bounding rectangle.
Let’s see the code:
# importing cv2 & numpy:
import numpy as np
import cv2
# Set image path
path = "C:/opencvImages/"
fileName = "mrrm9.png"
# Read input image:
inputImage = cv2.imread(path+fileName)
inputCopy = inputImage.copy()
# Convert BGR to grayscale:
grayscaleImage = cv2.cvtColor(inputImage, cv2.COLOR_BGR2GRAY)
From here there’s not much to discuss, just reading the BGR image and converting it to grayscale. Now, let’s apply an adaptive threshold using the gaussian method. This is the tricky part, as the parameters are adjusted manually depending on the quality of the input. The way the method works is dividing the image into a grid of cells of windowSize, it then applies a local threshold to found the optimal separation between foreground and background. An additional constant, indicated by windowConstant can be added to the threshold to fine tune the output:
# Set the adaptive thresholding (gasussian) parameters:
windowSize = 31
windowConstant = -1
# Apply the threshold:
binaryImage = cv2.adaptiveThreshold(grayscaleImage, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, windowSize, windowConstant)
You get this nice binary image:
Now, as you can see, the image has some blob noise. Let’s apply an area filter to get rid of the noise. The noise is smaller than the target blobs of interest, so we can easy filter them based on area, like this:
# Perform an area filter on the binary blobs:
componentsNumber, labeledImage, componentStats, componentCentroids = \
cv2.connectedComponentsWithStats(binaryImage, connectivity=4)
# Set the minimum pixels for the area filter:
minArea = 20
# Get the indices/labels of the remaining components based on the area stat
# (skip the background component at index 0)
remainingComponentLabels = [i for i in range(1, componentsNumber) if componentStats[i][4] >= minArea]
# Filter the labeled pixels based on the remaining labels,
# assign pixel intensity to 255 (uint8) for the remaining pixels
filteredImage = np.where(np.isin(labeledImage, remainingComponentLabels) == True, 255, 0).astype('uint8')
This is the filtered image:
We can improve the quality of this image with some morphology. Some of the characters seem to be broken (Check out the first 3 - it is broken in two separated blobs). We can join them applying a closing operation:
# Set kernel (structuring element) size:
kernelSize = 3
# Set operation iterations:
opIterations = 1
# Get the structuring element:
maxKernel = cv2.getStructuringElement(cv2.MORPH_RECT, (kernelSize, kernelSize))
# Perform closing:
closingImage = cv2.morphologyEx(filteredImage, cv2.MORPH_CLOSE, maxKernel, None, None, opIterations, cv2.BORDER_REFLECT101)
This is the "closed" image:
Now, you want to get the bounding boxes for each character. Let’s detect the outer contour of each blob and fit a nice rectangle around it:
# Get each bounding box
# Find the big contours/blobs on the filtered image:
contours, hierarchy = cv2.findContours(closingImage, cv2.RETR_CCOMP, cv2.CHAIN_APPROX_SIMPLE)
contours_poly = [None] * len(contours)
# The Bounding Rectangles will be stored here:
boundRect = []
# Alright, just look for the outer bounding boxes:
for i, c in enumerate(contours):
if hierarchy[0][i][3] == -1:
contours_poly[i] = cv2.approxPolyDP(c, 3, True)
boundRect.append(cv2.boundingRect(contours_poly[i]))
# Draw the bounding boxes on the (copied) input image:
for i in range(len(boundRect)):
color = (0, 255, 0)
cv2.rectangle(inputCopy, (int(boundRect[i][0]), int(boundRect[i][1])), \
(int(boundRect[i][0] + boundRect[i][2]), int(boundRect[i][1] + boundRect[i][3])), color, 2)
The last for loop is pretty much optional. It fetches each bounding rectangle from the list and draws it on the input image, so you can see each individual rectangle, like this:
Let's visualize that on the binary image:
Additionally, if you want to crop each character using the bounding boxes we just got, you do it like this:
# Crop the characters:
for i in range(len(boundRect)):
# Get the roi for each bounding rectangle:
x, y, w, h = boundRect[i]
# Crop the roi:
croppedImg = closingImage[y:y + h, x:x + w]
cv2.imshow("Cropped Character: "+str(i), croppedImg)
cv2.waitKey(0)
This is how you can get the individual bounding boxes. Now, maybe you are trying to pass these images to an OCR. I tried passing the filtered binary image (after the closing operation) to pyocr (That’s the OCR I’m using) and I get this as output string: 31197402
The code I used to get the OCR of the closed image is this:
# Set the OCR libraries:
from PIL import Image
import pyocr
import pyocr.builders
# Set pyocr tools:
tools = pyocr.get_available_tools()
# The tools are returned in the recommended order of usage
tool = tools[0]
# Set OCR language:
langs = tool.get_available_languages()
lang = langs[0]
# Get string from image:
txt = tool.image_to_string(
Image.open(path + "closingImage.png"),
lang=lang,
builder=pyocr.builders.TextBuilder()
)
print("Text is:"+txt)
Be aware that the OCR receives black characters on white background, so you must invert the image first.

How to visualize the segmented image of Selective Search?

How can I visualize the segmented image output of the Selective Search algorithm applied on an image?
import cv2
image = cv2.imread("x.jpg")
ss = cv2.ximgproc.segmentation.createSelectiveSearchSegmentation()
ss.setBaseImage(image)
ss.switchToSelectiveSearchQuality()
rects = ss.process()
That is, to get the image on the right
I am not sure but I think the image you require can possibly not be obtained.
Reason being:
Open this file first containing the source code
In lines 726-734, The variable "images" is private and in the method switchToSelectiveSearchQuality() at line 828, different images used for computation are stored in the private variable "images"(follow addImage function to see).
Also, the images stored in the "images" variable are called for processing segmentation at line 901. Method called here is processImage() of class "GraphSegmentation" which I am not able to trace backwards.
Thus, it is possible that the image you require is not at all stored anywhere or else stored in a private variable which we cannot access.
EDIT: Found "GraphSegmentation" class and method "processImage" declaration in this file at lines 46 and 52.
I think you can use the following. I tried it - it's working
import cv2, random
image = cv2.imread("x.jpg")
ss = cv2.ximgproc.segmentation.createSelectiveSearchSegmentation()
ss.setBaseImage(image)
ss.switchToSelectiveSearchQuality()
rects = ss.process()
for i in range(0, len(rects), 100):
# clone the original image so we can draw on it
output = image.copy()
# loop over the current subset of region proposals
for (x, y, w, h) in rects[i:i + 100]:
# draw the region proposal bounding box on the image
color = [random.randint(0, 255) for j in range(0, 3)]
cv2.rectangle(output, (x, y), (x + w, y + h), color, 2)
cv2.imshow("Output", output)
key = cv2.waitKey(0) & 0xFF
# if the `q` key was pressed, break from the loop
if key == ord("q"):
break
Why 100? I chose a chunk size of 100.
Original Image:
After processing:

Are there ways to do some crops in an image based on a given pattern?

I have an image of pots of the same size, the user must crop an area (let's say he crops the first pot (top left corner)) of ​​the image and depending on the pattern designed or cropped by the user, I must automatically perform other cropping and save their coordinates. Is there another technique to do this without template matching or do you think I can improve my code to do it with template matching only?
So far I have tried with template matching and saved the coordinates of each corresponding matched, but as you can see in the attached image, the result is not quite satisfying because I don't match all the pots and for some area I draw several rectangles for just one pot (the more I lower the threshold). Any help is higly appreciated.
Here is my code
# import necessary dependies
import cv2
import numpy as np
# Read the source image
img_rgb = cv2.imread('image.jpg')
# Convert the source image to gray
img_gray = cv2.cvtColor(img_rgb, cv.COLOR_BGR2GRAY)
# Read the pattern image designed by the user
template = cv2.imread('mattern.png',0)
# Get the shape of the pattern image
w, h = template.shape[::-1]
# Apply cv2 template matching functon
res = cv2.matchTemplate(img_gray,template,cv.TM_CCOEFF_NORMED)
# List of coordinates of the matched templates
list_coordinates = []
labeled_coordinates = []
# Threshold applied to the template matching
threshold = 0.55
# Apply the threshold to cv2 the template matching
loc = np.where( res >= threshold)
# Directory to save the matched templates (pattern)
s_dir = "s_img/"
# Counter to add in the name of the saved image
i = 1
for pt in zip(*loc[::-1]):
# Draw a rectangle to area that satifies the condition
cv2.rectangle(img_rgb, pt, (pt[0] + w, pt[1] + h), (0,0,255), 1)
top_left = pt
# Crop every matched templates (pattern)
crop_img = img_rgb[top_left[1]:top_left[1] + h, top_left[0]:top_left[0]+w]
bottom_right = (top_left[0] + w, top_left[1] + h)
# Save the crop patterns for future use
cv2.imwrite(s_dir+"crop_1_"+str(i)+".png", crop_img)
# Label the coordinates for future use
labeled_coordinates = ["crop_1_"+str(i), top_left[0], top_left[1], bottom_right[0], bottom_right[1]]
# Add the coordinates in a list
list_coordinates.append(labeled_coordinates)
i += 1
cv2.imshow('template',template)
cv2.imshow('mathced',img_rgb)
cv2.waitKey(0)
cv2.destroyAllWindows()

Python face_recognition dataset quality

I´m construction a dataset with more than one image for each person for python face_recognition package. It will add a classifier on top of the bultin model. See also this example: face_recognition_knn.py. here is my code:
# import the necessary packages
from imutils import paths
import face_recognition
import pickle
import cv2
import os
# grab the paths to the input images in our dataset
print("[INFO] quantifying faces...")
imagePaths = list(paths.list_images('dataset'))
# initialize the list of known encodings and known names
knownEncodings = []
knownNames = []
# loop over the image paths
for (i, imagePath) in enumerate(imagePaths):
# extract the person name from the image path
print(f"[INFO] processing image {i+1}/{len(imagePaths)} -> {imagePath}")
name = imagePath.split(os.path.sep)[-2]
# load the input image and convert it from BGR (OpenCV ordering)
# to dlib ordering (RGB)
image = cv2.imread(imagePath)
rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
# detect the (x, y)-coordinates of the bounding boxes
# corresponding to each face in the input image
boxes = face_recognition.face_locations(rgb,
model='hog') #can be cnn
# compute the facial embedding for the face
encodings = face_recognition.face_encodings(rgb, boxes)
# loop over the encodings
for encoding in encodings:
# add each encoding + name to our set of known names and
# encodings
knownEncodings.append(encoding)
knownNames.append(name)
# dump the facial encodings + names to disk
print("[INFO] serializing encodings...")
data = {"encodings": knownEncodings, "names": knownNames}
f = open('encodings.pickle', "wb")
f.write(pickle.dumps(data))
f.close()
Then, I try to identify these people with this code:
import face_recognition
import pickle
import cv2
import numpy as np
import requests
from datetime import datetime
# load the known faces and embeddings
print("[INFO] loading encodings...")
data = pickle.loads(open("encodings.pickle", "rb").read())
def processa_imagem(url):
# load the input image and convert it from BGR to RGB and returns file with cofidence
image = cv2.imread(url)
if image is None:
print(f'Image not found: {imagem}')
#image = np.array(image, dtype=np.uint8)
rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
# detect the (x, y)-coordinates of the bounding boxes corresponding
# to each face in the input image, then compute the facial embeddings
# for each face
print("[INFO] recognizing faces...")
boxes = face_recognition.face_locations(rgb,
model='hog')
encodings = face_recognition.face_encodings(rgb, boxes)
# initialize the list of names for each face detected
names = []
# loop over the facial embeddings
for encoding in encodings:
# attempt to match each face in the input image to our known
# encodings
## ATTENTION! the ideal is face_recognition.api.batch_face_locations but i dont have a GPU
matches = face_recognition.face_distance(data["encodings"],
encoding)
name = "unkown"
# check to see if we have found a match
if max(matches) > 0.7:
# find the indexes of all matched faces then initialize a
# dictionary to count the total number of times each face
# was matched
matchedIdxs = [i for (i, b) in enumerate(matches) if b]
counts = {}
# loop over the matched indexes and maintain a count for
# each recognized face face
for i in matchedIdxs:
name = data["names"][i]
counts[name] = counts.get(name, 0) + 1
# determine the recognized face with the largest number of
# votes (note: in the event of an unlikely tie Python will
# select first entry in the dictionary)
name = max(counts, key=counts.get)
# update the list of names
names.append(name)
# loop over the recognized faces
for ((top, right, bottom, left), name) in zip(boxes, names):
# draw the predicted face name on the image
cv2.rectangle(image, (left, top), (right, bottom), (255, 0, 0), 2)
y = top - 15 if top - 15 > 15 else top + 15
cv2.putText(image, name, (left, y), cv2.FONT_HERSHEY_SIMPLEX,
0.75, (255, 0, 0), 2)
now = datetime.now()
current_time = now.strftime("%H%M%S%f")
#file_path = f'static/face-{current_time}.jpg'
file_path = f'face-{current_time}.jpg'
cv2.imwrite(file_path,image)
return (file_path, ', '.join(names))
On my dataset, I´ve added, on average, about 10 photos of each individual. The script uses face_recognition.face_distance and it works well to recognize someone in the dataset.
The problema is that, when I use it with someone that OUT. For these people, sometimes I still get about 0.90 higher confidence false positive results.
Some of the pictures in dataset are low quality. Maybe that´s the reason? Should I change my approach, using more detailed photos (2 or 3) and maybe encoding them with jitters?
Thanks for any input!

How to remove multiple polygons using Opencv python

Hi StackOverflow team,
I have an image and I want to remove many portions/parts from the image. I tried to use the below code taken from Cropping Concave polygon from Image using Opencv python
Assume I have this image . Also, I have multiple polygons (such as rectangular shapes or any form of a polygon) from the image achieved via lebelme annotation tool. So, I want to remove those shapes from the images or simply changing their pixels to white.
In other words, Labelme Tool will give you a dictionary file, where the dictionary has a key consisting of the points of each portion/polygon/shape)
Then the polygon points can be easily extracted from the dictionary file. After points are extracted, we can define our points by giving names (e.g a,b,s...h), and each one is in this multidimensional format "[[1526, 319], [1526, 376], [1593, 379], [1591, 324]]"
Here I thought of whitening each region. but whitening of multidimensional array seems to be unreliable.
import numpy as np
import cv2
import json
with open('ann1.json') as f:
data = json.load(f)
#%%
a = data['shapes'][0]['points']; b = data['shapes'][1]['points']; c = data['shapes'][2]['points'];
#%%
img = cv2.imread("lena.jpg")
pts = np.array(a) # Points
#%%
## (1) Crop the bounding rect
rect = cv2.boundingRect(pts)
x,y,w,h = rect
croped = img[y:y+h, x:x+w].copy()
## (2) make mask
pts = pts - pts.min(axis=0)
mask = np.zeros(croped.shape[:2], np.uint8)
cv2.drawContours(mask, [pts], -1, (255, 255, 255), -1, cv2.LINE_AA)
## (3) do bit-op
dst = cv2.bitwise_and(croped, croped, mask=mask)
## (4) add the white background
bg = np.ones_like(croped, np.uint8)*255
cv2.bitwise_not(bg,bg, mask=mask)
dst2 = bg+ dst
#cv2.imwrite("croped.png", croped)
#cv2.imwrite("mask.png", mask)
#cv2.imwrite("dst.png", dst)
cv2.imwrite("dst2.png", dst2)
Using Lena I have this output .
But I need to go further and whiten other points/polygons, for example, the eyes.
As you can see my code can use only one polygon points. I tried appending two other polygon points in my case the two eyes and got .
By appending, I mean I added the multidimensional points (e.g. pts = np.array(a+b+c)).
In short, having an image is there a short way to remove these multiple polygons from the image (by keeping the dimensions of the image) using OpenCV and python.
Json File:
https://drive.google.com/file/d/1UyOYUVMHpu2vBBEdR99bwrRX5xIfdOCa/view?usp=sharing
You'll need to use to loop to go through all the points in the JSON file. I've edited your code to reflect this.
import cv2
import json
import matplotlib.pyplot as plt
import numpy as np
img_path =r"/path/to/lena.png"
json_path = r"/path/to/lena.json"
with open(json_path) as f:
data = json.load(f)
img = cv2.imread(img_path)
for idx in np.arange(len(data['shapes'])):
if idx == 0: #can remove this
continue #can remove this
a = data['shapes'][idx]['points']
pts = np.array(a) # Points
## (1) Crop the bounding rect
rect = cv2.boundingRect(pts)
print(rect)
x,y,w,h = rect
img[y:y+h, x:x+w] = (255, 255, 255)
plt.imshow(img)
plt.show()
Output:
I ignored the first line, since it didn't visualize the results nicely. I took your lead and used rectangles instead of polygons. If you need polygons, you'll need to use something like cv2.drawContours() or cv2.polylines() or cv2.fillPoly() as is recommnded in the SO answer you have linked here, to achieve it.
I would like to share with you my expected solution which is a bit modified version of #Shawn Mathew answer.
Input image:
Code:
with open('lena.json') as f:
json_file = json.load(f)
img = cv2.imread("folder/lena.jpg")
for polygon in np.arange(len(json_file['shapes'])):
pts = np.array(json_file['shapes'][polygon]['points'])
# If your polygons are rectangular, you can fill with white color to the areas you want be removed by uncommenting the below two lines
# x,y,w,h = cv2.boundingRect(pts)
# cv2.rectangle(img, (x, y), (x+w, y+h), (255, 255, 255), -1)
# if your polygons are different shapes other than rectangles you can just use the below line
cv2.fillPoly(img, pts =[pts], color=(255,255,255))
plt.imshow(img)
plt.show()
The color of the image changed because of Matplotlib, if you want to preserve the color save the image using cv2.imwrite

Categories

Resources