I'm trying to create simple Eigenfaces face recognition app using Python and OpenCV. Unfortunately when I try to play app, then I got result:
(-1, '\n', 1.7976931348623157e+308), where -1 stands for not found and confidence... Is quite high...
Is there possibility to put by someone the most basic OpenCV implementation of Eigenfaces?
Here is my approach to the problem. I use Python2, as it is suggested in official documentation (due to some problems with P3).
import cv2 as cv
import numpy as np
import os
num_components = 10
threshold = 10.0
faceRecognizer = cv.face_EigenFaceRecognizer.create(num_components, threshold)
images = []
labels = []
textLabels = ["Person1", "Person2", "Person3"]
destinedIm = cv.imread("images/set1/1.jpg", cv.IMREAD_GRAYSCALE)
destinedSize = destinedIm.shape
#Person1
img = cv.imread("images/set1/1.jpg", cv.IMREAD_GRAYSCALE)
imResized = cv.resize(img, destinedSize)
images.append(imResized)
labels.append(0)
#In similar way I read total 8 images of set1 and 6 images of set2 (2 different people, with label 0 and 1 respectively)
cv.imwrite("images/set2/resized.jpg", imResized) #this doesn't work
numpyImages = np.array(images)
numpyLabels = np.array(labels)
# cv.face_FaceRecognizer.train(self=faceRecognizer, src=images, labels=labels)
faceRecognizer.train(src=images, labels=numpyLabels)
testImage = cv.imread("images/set1/testIm.jpg", cv.IMREAD_GRAYSCALE)
# cv.face_FaceRecognizer.predict()
resultLabel, resultConfidence = faceRecognizer.predict(testImage)
print (resultLabel, "\n" ,resultConfidence)
testImage is another image of person with label = 0;
I would look at the sizing of the testImage. Also, I used a different sizing method than you used and got it working.
face_resized = cv2.resize(img, (299, 299))
Related
I am building an OCR model where I have performed object detection on the images. I am calling the detection function to detect bounding boxes. I am cropping the images basis bounding boxes. The challenge I am facing is the cropped images are too small for tesseract for data extraction and it is impacting the accuracy quality.
# Crop Image
cropped_image = tf.image.crop_to_bounding_box(image, y_min, x_min, y_max - y_min, x_max - x_min)
# write jpg with pillow
img_pil = Image.fromarray(cropped_image.numpy())
score = bscores[idx] * 100
file_name = OUTPUT_PATH + "somefilename"
img_pil = ImageOps.grayscale(img_pil)
img_pil.save(file_name, quality=95, subsampling=0)
I am running super resolution algorithm over the cropped images to improve the image quality before passing to tesseract, however still not able to achieve good accuracy.
# Create an SR object
sr = dnn_superres.DnnSuperResImpl_create()
# Define model path
model_path = os.path.join(base_path, model + ".pb")
# Extract model name, get the text between '/' and '_'
model_name = model_path.split('\\')[-1].split('_')[0].lower()
# Extract model scale
model_scale = int(model_path.split('\\')[-1].split('_')[1].split('.')[0][1])
# Read the desired model
sr.readModel(model_path)
sr.setModel(model_name, model_scale)
How to fix these cropped images issue so that data extraction is more accurate.
Have you tried OCRing and then cropping, rather than the reverse? It may take longer but it is likely going to be more accurate.
I have a lot of experience using ocrmypdf with PDFPlumber and Regex to parse PDF documents into spreadsheets and this is the process I generally follow:
import pandas as pd
import os
import pdfplumber
import re
#OCR PDF
os.system('ocrmypdf --force-ocr --deskew path/to/file.pdf path/to/file.pdf')
text = ''
with pdfplumber.open('path/to/file.pdf'):
for i in range(0, len(pages)):
page = pdf.pages[i]
text = page.extract_text()
pdf_text = pdf_text + '\n' + text
ids = re.findall('id: (.*)', text)
y = pdf_text.split('\n')
ds = []
for i,j in enumerate(ids):
d = {}
try:
id1 = ids[i]
idx1 = [idx for idx, s in enumerate(y) if id1 in s][0]
try:
id2 = ids[i+1]
idx2 = [idx for idx, s in enumerate(y) if id2 in s][0]
z = y[idx1:idx2]
except:
z = y[idx1:]
except:
pass
chunk = ''
#may need to add if/else or try/except
d['value'] = re.findall('Model name: (.*)', chunk)[0]
#rinse and repeat
ds.append(d)
df = pd.DataFrame(ds)
Not sure how helpful that will be, but it may give you some inspiration.
I am a new to deep learning algorithms and Machine learning as well as working with data. I am currently trying to work with annotated video dataset, I tried to have a simple example on How I should get started. I am aware that to work with video dataset, we will first need to extract the images from videos and then do the image processing. However, as I am new it is still difficult for me to understand the steps. I came accross this link, it is great but the data is really large and it cannot be downloaded on my computer.
https://www.analyticsvidhya.com/blog/2019/09/step-by-step-deep-learning-tutorial-video-classification-python/
Any suggestions to a walk through examples I can use to build my understanding and Know how to deal with these datasets
Here is a way to create synthetic video dataset quickly:
import numpy as np
import skvideo.io as sk
# creating sample video data (Here object is moving towards left)
num_vids = 5
num_imgs = 50
img_size = 50
min_object_size = 1
max_object_size = 5
for i_vid in range(num_vids):
imgs = np.zeros((num_imgs, img_size, img_size)) # set background to 0
vid_name = "vid" + str(i_vid) + ".mp4"
w, h = np.random.randint(min_object_size, max_object_size, size=2)
x = np.random.randint(0, img_size - w)
y = np.random.randint(0, img_size - h)
i_img = 0
while x > 0:
imgs[i_img, y : y + h, x : x + w] = 255 # set rectangle as foreground
x = x - 1
i_img = i_img + 1
sk.vwrite(vid_name, imgs.astype(np.uint8))
from IPython.display import Video
Video("vid3.mp4") # the script & video generated should be in same folder
Similarly you can create videos where, object(s) move(s) in other directions.
EDIT: Code is working now, thanks to Mark and zephyr. zephyr also has two alternate working solutions below.
I want to divide blend two images with PIL. I found ImageChops.multiply(image1, image2) but I couldn't find a similar divide(image, image2) function.
Divide Blend Mode Explained (I used the first two images here as my test sources.)
Is there a built-in divide blend function that I missed (PIL or otherwise)?
My test code below runs and is getting close to what I'm looking for. The resulting image output is similar to the divide blend example image here: Divide Blend Mode Explained.
Is there a more efficient way to do this divide blend operation (less steps and faster)? At first, I tried using lambda functions in Image.eval and ImageMath.eval to check for black pixels and flip them to white during the division process, but I couldn't get either to produce the correct result.
EDIT: Fixed code and shortened thanks to Mark and zephyr. The resulting image output matches the output from zephyr's numpy and scipy solutions below.
# PIL Divide Blend test
import Image, os, ImageMath
imgA = Image.open('01background.jpg')
imgA.load()
imgB = Image.open('02testgray.jpg')
imgB.load()
# split RGB images into 3 channels
rA, gA, bA = imgA.split()
rB, gB, bB = imgB.split()
# divide each channel (image1/image2)
rTmp = ImageMath.eval("int(a/((float(b)+1)/256))", a=rA, b=rB).convert('L')
gTmp = ImageMath.eval("int(a/((float(b)+1)/256))", a=gA, b=gB).convert('L')
bTmp = ImageMath.eval("int(a/((float(b)+1)/256))", a=bA, b=bB).convert('L')
# merge channels into RGB image
imgOut = Image.merge("RGB", (rTmp, gTmp, bTmp))
imgOut.save('PILdiv0.png', 'PNG')
os.system('start PILdiv0.png')
You are asking:
Is there a more efficient way to do this divide blend operation (less steps and faster)?
You could also use the python package blend modes. It is written with vectorized Numpy math and generally fast. Install it via pip install blend_modes. I have written the commands in a more verbose way to improve readability, it would be shorter to chain them. Use blend_modes like this to divide your images:
from PIL import Image
import numpy
import os
from blend_modes import blend_modes
# Load images
imgA = Image.open('01background.jpg')
imgA = numpy.array(imgA)
# append alpha channel
imgA = numpy.dstack((imgA, numpy.ones((imgA.shape[0], imgA.shape[1], 1))*255))
imgA = imgA.astype(float)
imgB = Image.open('02testgray.jpg')
imgB = numpy.array(imgB)
# append alpha channel
imgB = numpy.dstack((imgB, numpy.ones((imgB.shape[0], imgB.shape[1], 1))*255))
imgB = imgB.astype(float)
# Divide images
imgOut = blend_modes.divide(imgA, imgB, 1.0)
# Save images
imgOut = numpy.uint8(imgOut)
imgOut = Image.fromarray(imgOut)
imgOut.save('PILdiv0.png', 'PNG')
os.system('start PILdiv0.png')
Be aware that for this to work, both images need to have the same dimensions, e.g. imgA.shape == (240,320,3) and imgB.shape == (240,320,3).
There is a mathematical definition for the divide function here:
http://www.linuxtopia.org/online_books/graphics_tools/gimp_advanced_guide/gimp_guide_node55_002.html
Here's an implementation with scipy/matplotlib:
import numpy as np
import scipy.misc as mpl
a = mpl.imread('01background.jpg')
b = mpl.imread('02testgray.jpg')
c = a/((b.astype('float')+1)/256)
d = c*(c < 255)+255*np.ones(np.shape(c))*(c > 255)
e = d.astype('uint8')
mpl.imshow(e)
mpl.imsave('output.png', e)
If you don't want to use matplotlib, you can do it like this (I assume you have numpy):
imgA = Image.open('01background.jpg')
imgA.load()
imgB = Image.open('02testgray.jpg')
imgB.load()
a = asarray(imgA)
b = asarray(imgB)
c = a/((b.astype('float')+1)/256)
d = c*(c < 255)+255*ones(shape(c))*(c > 255)
e = d.astype('uint8')
imgOut = Image.fromarray(e)
imgOut.save('PILdiv0.png', 'PNG')
The problem you're having is when you have a zero in image B - it causes a divide by zero. If you convert all of those values to one instead I think you'll get the desired result. That will eliminate the need to check for zeros and fix them in the result.
I am trying to remove a fixed background from an image with a single free-falling object. The image has a single free falling object and it has a white background with a circular patch in the middle.
Below is my code for the above task. I have used OpenCV BackgroundSubtractorKNN and BackgroundSubtractorMOG2 algorithm to achieve this task. The left images should be given as input and the code should produce the right images as output.
import numpy as np
import cv2
import sys
import os
#backgroundSubtractor = cv2.createBackgroundSubtractorMOG2()
backgroundSubtractor = cv2.createBackgroundSubtractorKNN()
# apply the algorithm for background images using learning rate > 0
for i in range(1, 16):
bgImageFile = "background/BG.png"
print("Opening background", bgImageFile)
bg = cv2.imread(bgImageFile)
backgroundSubtractor.apply(bg, learningRate=0.5)
# apply the algorithm for detection image using learning rate 0
dirc = os.getcwd()
filepath = os.path.join(dirc,'data')
if not os.path.exists('saved_bgRemoved'):
os.makedirs('saved_bgRemoved')
for file in os.listdir(filepath):
stillFrame = cv2.imread(os.path.join(filepath,file))
fgmask = backgroundSubtractor.apply(stillFrame, learningRate=0)
bgImg = cv2.bitwise_and(stillFrame,stillFrame,mask=fgmask)
# show both images
cv2.imshow("original", stillFrame)
cv2.imshow("mask", fgmask)
cv2.imshow("Cut Image", bgImg)
cv2.waitKey()
cv2.destroyAllWindows()
cv2.imwrite(os.path.join('saved_bgRemoved',file), bgImg)
My code works very well with the above dataset, but it fails to work with the image data below:
It also doesn't work if the object is colored in greyish texture. I think it works well when the pixel distribution of the object is uniform and different from the background (i.e. the circular patch).
Is there any other best way to achieve this task, so that it can subtract the background even from the hollow area of the object, without subtracting parts of the object?
use below code, I think it now works
import cv2, os
def remove_bg(bg_path,im_path):
bg = cv2.imread(bg_path)
im = cv2.imread(im_path)
row,col,_ = im.shape
for i in range(0,row):
for j in range(0,col):
if ( bg[i][j][0] == im[i][j][0] and bg[i][j][1] == im[i][j][1] and bg[i][j][2] == im[i][j][2] ):
im[i][j] = [0,0,0] #it will replace background with black color, you can change it for example to [255,0,0] to replace it with red
return(im)
directory,_=os.path.split(__file__)
bg_path = directory + "\\background.png"
im_path = directory + "\\data6.png"
result = remove_bg(bg_path,im_path)
cv2.imshow("result", result)
cv2.waitKey()
cv2.imwrite(directory + "\\Result.png", result)
I have an RGB image. When I import this image, I convert it to HSV using matplotlib.color and save the resulting array in a dict. When I want to display this image, I use Image.fromarray with mode = 'HSV'. I'm not sure what I am doing wrong but when the image is displayed, I get a mess (seen below along with code). Any help is appreciated. The code snippets below are roughly what happens in order to any given set of imported images.
RGB to HSV Code:
from skimage import io
import matplotlib.colors as mpclr
import glob
import os
from PIL import Image, ImageOps
types = ("\*.tif", "\*.jpg", "\*.ppm")
imagePath = []
def importAllImgs(folderPath):
for ext in types:
imagePath.extend(glob.glob(folderPath + ext))
im_coll = io.ImageCollection(imagePath, conserve_memory = True)
im_array = []
for i in range(len(im_coll)):
#CONVERSION HAPPENS HERE
image = im_coll[i]
fltImg = np.around((np.array(image)/255.0), decimals = 2)
imgHSV = mpclr.rgb_to_hsv(fltImg)
im_array.append(imgHSV)
return im_array, imagePath
Storage of Data:
def organizeAllData(self, imgArrList, imgPathList):
self.allImages = dict()
self.imageKeys = imgPathList
for i in range(len(imgPathList)):
self.allImages[imgPathList[i]] = {'H': imgArrList[i][:, :, 0],
'S': imgArrList[i][:, :, 1],
'V': imgArrList[i][:, :, 2]}
self.hsvValues = []
self.labelValues = []
return self.allImages
Construction of array for displaying image:
def getImage(self, imageOfInterest):
H = self.allImages[imageOfInterest]['H'][:,:]
S = self.allImages[imageOfInterest]['S'][:,:]
V = self.allImages[imageOfInterest]['V'][:,:]
imgArray = np.dstack((H,S,V))
return imgArray
Displaying of Image:
preImArray = halThrThsnd.getImage(self.imagePaths[self.imageIndex])
self.preIm = Image.fromarray(preImArray, 'HSV')
And finally, the resulting image:
As per user sascha's comment (see below question), I decided to normalize the libraries I'm using for HSV conversion. Once I did that, I got normal images no problem. It turns out that depending on what library you use for image conversion, you will get different HSV value ranges. Some libraries will produce a range from 0 to 1. Others will produce a range from 0 to 255.
Tl;dr: Used the same library across all processes, got a good image.