How can I visualize the segmented image output of the Selective Search algorithm applied on an image?
import cv2
image = cv2.imread("x.jpg")
ss = cv2.ximgproc.segmentation.createSelectiveSearchSegmentation()
rects = ss.process()
That is, to get the image on the right
I am not sure but I think the image you require can possibly not be obtained.
Reason being:
Open this file first containing the source code
In lines 726-734, The variable "images" is private and in the method switchToSelectiveSearchQuality() at line 828, different images used for computation are stored in the private variable "images"(follow addImage function to see).
Also, the images stored in the "images" variable are called for processing segmentation at line 901. Method called here is processImage() of class "GraphSegmentation" which I am not able to trace backwards.
Thus, it is possible that the image you require is not at all stored anywhere or else stored in a private variable which we cannot access.
EDIT: Found "GraphSegmentation" class and method "processImage" declaration in this file at lines 46 and 52.
I think you can use the following. I tried it - it's working
import cv2, random
image = cv2.imread("x.jpg")
ss = cv2.ximgproc.segmentation.createSelectiveSearchSegmentation()
rects = ss.process()
for i in range(0, len(rects), 100):
# clone the original image so we can draw on it
output = image.copy()
# loop over the current subset of region proposals
for (x, y, w, h) in rects[i:i + 100]:
# draw the region proposal bounding box on the image
color = [random.randint(0, 255) for j in range(0, 3)]
cv2.rectangle(output, (x, y), (x + w, y + h), color, 2)
cv2.imshow("Output", output)
key = cv2.waitKey(0) & 0xFF
# if the `q` key was pressed, break from the loop
if key == ord("q"):
Why 100? I chose a chunk size of 100.
Original Image:
After processing:
I am very new to OpenCV and Python. So, for my first project to recognize objects, I am using a small test file in python. This is the test file
import cv2
from matplotlib import pyplot as plt
# Opening image
img = cv2.imread("image.jpg")
# OpenCV opens images as BRG
# but we want it as RGB We'll
# also need a grayscale version
img_gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
img_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
# Use minSize because for not
# bothering with extra-small
# dots that would look like STOP signs
stop_data = cv2.CascadeClassifier('cascade.xml')
found = stop_data.detectMultiScale(img_gray,
amount_found = len(found)
if amount_found != 0:
# There may be more than one
# sign in the image
for (x, y, width, height) in found:
# We draw a green rectangle around
# every recognized sign
cv2.rectangle(img_rgb, (x, y),
(x + height, y + width),
(0, 255, 0), 5)
# Creates the environment of
# the picture and shows it
plt.subplot(1, 1, 1)
The tutorial that I saw told me to use a premade cascade.xml file. However, I wanted to train my cascade file to recognize a simple apple logo, which I cropped from a photo. I copy pasted the same image multiple times into one folder titled p. For the negative images, I used a folder titled n. After this, I used the GUI cascade trainer and trained a cascade file. However, when I use the same image (uncropped) in the python program, there is no output.
This is the image which I used to train:
This is the original image which I put in the python script
The original image is of size 422612 and the cropped image is of size 285354. I had set the size in the trainer to height=20 and width=15.
These are my settings:
Folder p:
Output of program:
Cascade GUI log end
NEG count : acceptanceRatio 0 : 0
Required leaf false alarm rate achieved. Branch training terminated.
Could someone tell me where I am going wrong ? Please ask if I have to provide anymore information. Is there any ratio relation between the positive image and the actual image that I am missing on ?
UPDATE: I added some more negative images and am now getting this output:
I work at a studio that does school photos and we are trying to make a script to eliminate the job of cropping each photo to a template. The photos we work with are fairly uniform but they vary in resolution and head position a bit. I took up the mantle of trying to write the script with my fairly limited Python knowledge and through a lot of trial and error and online resources I think I have got most of the way there.
At the moment I am trying to figure out the best way to have the image crop from the NumPy array with the head where I want and I just cant find a good flexible solution. The head needs to be positioned slightly differently for pose 1 and pose 2 so its needs to be easy to change on the fly (Probably going to implement some sort of simple GUI to input stuff like that, but for now I can just change the code).
I also need to be able to change the output resolution of the photo so they are all uniform (2000x2500). Anyone have any ideas?
At the moment this is my current code, it just saves the detected face square:
import cv2
import os.path
import glob
# Cascade path
cascPath = 'haarcascade_frontalface_default.xml'
# Create the haar cascade
faceCascade = cv2.CascadeClassifier(cascPath)
#Check for output folder and create if its not there
if not os.path.exists('output'):
# Read Images
images = glob.glob('*.jpg')
for c, i in enumerate(images):
image = cv2.imread(i, 1)
# Convert to grayscale
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
# Find face(s) using cascade
faces = faceCascade.detectMultiScale(
scaleFactor=1.1, # size of groups
minNeighbors=5, # How many groups around are detected as face for it to be valid
minSize=(500, 500) # Min size in pixels for face
# Outputs number of faces found in image
print('Found {0} faces!'.format(len(faces)))
# Places a rectangle on face
for (x, y, w, h) in faces:
imgCrop = image[y:y+h,x:x+w]
if len(faces) > 0:
#Saves Images to output folder with OG name
cv2.imwrite('output/'+ i, imgCrop)
I can crop using it like this:
# Crop Padding
left = 300
right = 300
top = 400
bottom = 1000
for (x, y, w, h) in faces:
imgCrop = image[y-top:y+h+bottom, x-left:x+w+right]
but that outputs pretty random resolutions and changes based on the image resolution
To set a new resolution with the dimension, you can use cv2.resize. There may be a pixel loss so you can use the interpolation method.
The newly resized image may be in BGR format, so you may need to convert to RGB format.
cv2.resize(src=crop, dsize=(2000, 2500), interpolation=cv2.INTER_LANCZOS4)
crop = cv2.cvtColor(crop, cv2.COLOR_BGR2RGB) # Make sure the cropped image is in RGB format
cv2.imwrite("image-1.png", crop)
One approach is using python's face-recognition library.
The approach is using two sample images for training.
Predict the next image based on training images.
For instance, The followings are the training images:
We want to predict the faces in the below image:
When we get the facial encodings of the training images and apply to the next image:
import face_recognition
import numpy as np
import matplotlib.pyplot as plt
from PIL import Image, ImageDraw
# Load a sample picture and learn how to recognize it.
first_image = face_recognition.load_image_file("images/ex.jpg")
first_face_encoding = face_recognition.face_encodings(first_image)[0]
# Load a second sample picture and learn how to recognize it.
second_image = face_recognition.load_image_file("images/index.jpg")
sec_face_encoding = face_recognition.face_encodings(second_image)[0]
# Create arrays of known face encodings and their names
known_face_encodings = [
print('Learned encoding for', len(known_face_encodings), 'images.')
# Load an image with an unknown face
unknown_image = face_recognition.load_image_file("images/babes.jpg")
# Find all the faces and face encodings in the unknown image
face_locations = face_recognition.face_locations(unknown_image)
face_encodings = face_recognition.face_encodings(unknown_image, face_locations)
# Convert the image to a PIL-format image so that we can draw on top of it with the Pillow library
# See for more about PIL/Pillow
pil_image = Image.fromarray(unknown_image)
# Create a Pillow ImageDraw Draw instance to draw with
draw = ImageDraw.Draw(pil_image)
# Loop through each face found in the unknown image
for (top, right, bottom, left), face_encoding in zip(face_locations, face_encodings):
matches = face_recognition.compare_faces(known_face_encodings, face_encoding)
face_distances = face_recognition.face_distance(known_face_encodings, face_encoding)
best_match_index = np.argmin(face_distances)
draw.rectangle(((left, top), (right, bottom)), outline=(0, 0, 255), width=5)
# Remove the drawing library from memory as per the Pillow docs
del draw
# Display the resulting image
The output will be:
The above is my suggestion. When you create a new resolution with the current image, there will be a pixel loss. Therefore you need to use an interpolation method.
For instance: after finding the face locations, select the coordinates in the original image.
# Add after draw.rectangle function.
crop = unknown_image[top:bottom, left:right]
Set new resolution with the size 2000 x 2500 and interpolation with CV2.INTERN_LANCZOS4.
Possible Question: Why CV2.INTERN_LANCZOS4?
Of course, you can select whatever you like, but in this post CV2.INTERN_LANCZOS4 was suggested.
cv2.resize(src=crop, dsize=(2000, 2500), interpolation=cv2.INTER_LANCZOS4)
Save the image
crop = cv2.cvtColor(crop, cv2.COLOR_BGR2RGB) # Make sure the cropped image is in RGB format
cv2.imwrite("image-1.png", crop)
Outputs are around 4.3 MB Therefore I can't display in here.
From the final result, we clearly see and identify faces. The library precisely finds the faces in the image.
Here what you can do:
Either you can use the training images of your own-set, or you can use the example above.
Apply the face-recognition function for each image, using the trained face-locations and save the results in the directory.
here is how I got it to crop how I wanted, this is added right below the "output number of faces" function
#Get the face postion and output values into variables, might not be needed but I did it
for (x, y, w, h) in faces:
xdis = x
ydis = y
w = w
h = h
#Get scale value by dividing wanted head hight by detected head hight
ws = 600/w
hs = 600/h
#scale image to get head to right size, uses bilinear interpolation by default
scale = cv2.resize(image,(0,0),fx=hs,fy=ws)
#calculate head postion for given values
sxdis = int(xdis*ws) #applying scale to x distance and turning it into a integer
sydis = int(ydis*hs) #applying scale to y distance and turning it into a integer
sycent = sydis+300 #adding half head hight to get center
ystart = sycent-700 #subtract where you want the head center to be in pixels, this is for the vertical
yend = ystart+2500 #Add whatever you want vertical resolution to be
xcent = sxdis+300 #adding half head hight to get center
xstart = xcent-1000 #subtract where you want the head center to be in pixels, this is for the horizontal
xend = xstart+2000 #add whatever you want the horizontal resolution to be
#Crop the image
cropped = scale[ystart:yend, xstart:xend]
Its a mess but it works exactly how I wanted it to work.
ended up going with openCV instead of switching to python-Recognition because of speed but I might switch over if I can get multithreading to work in python-recognition.
So i would like to make a program which can detect an object by color, position and sharpness.
Now I am there that I could detect the object by color and draw its contour and bounding box.
My problem is that i really dont know how to cut out the object from the picture and save it as picture file when the program recognise its contour or bounding box.
here's a picture of what my camera is seeing
I would like to cut out what is inside of the green colored boundig box as many times as fps in the video and as long as you can see it in the video. So if the video is 30 fps and the object is visible for 10 seconds it needs to take 300 pictures.
Here is the code:
i know it looks bad, im just trying to figure out what to use to make it work
import cv2 as cv
import numpy as np
import os
import uuid
cap = cv.VideoCapture(1)
path = os.getcwd()
def createFolder(directory):
if not os.path.exists(directory):
except OSError:
print('Error: Creating directory. ' + directory)
# folderName = '%s' % (str(uuid.uuid4()))
while cap.isOpened():
_, frame =
hsv = cv.cvtColor(frame, cv.COLOR_BGR2HSV)
# blue is the chosen one for now
lower_color = np.array([82, 33, 39])
upper_color = np.array([135, 206, 194])
mask = cv.inRange(hsv, lower_color, upper_color)
kernel = np.ones((5, 5), np.uint8)
mask = cv.erode(mask, kernel)
contours, hierarchy = cv.findContours(mask, cv.RETR_EXTERNAL, cv.CHAIN_APPROX_SIMPLE)
# find contour
for contour in contours:
area = cv.contourArea(contour)
x, y, h, w = cv.boundingRect(contour)
if area > 100:
# bounding box
# cv.rectangle(frame, (x - 40, y - 30), (x + h * 3, y + w * 3), (0, 255, 0), 1)
# cutting and saving
ext_left = tuple(contour[contour[:, :, 0].argmin()][0] - 20)
ext_right = tuple(contour[contour[:, :, 0].argmax()][0] + 20)
ext_top = tuple(contour[contour[:, :, 1].argmin()][0] - 20)
ext_bot = tuple(contour[contour[:, :, 1].argmax()][0] + 20)
outfile = '%s.jpg' % (str(uuid.uuid4()))
cropped_image = frame[ext_top[1]:ext_bot[1], ext_left[0]:ext_right[0]]
# write images to a specified folder
cv.imwrite(os.path.join(path, "/data/", outfile), cropped_image)
# outputs
cv.imshow("Frame", frame)
cv.imshow("Mask", mask)
key = cv.waitKey(1)
if key == 27:
Focusing on the question and ignoring the code style, I can say you are close to achieving your goal :)
For cropping the object, you can use the Mat copyTo method. Here is the official OpenCV documentation and here is an example from the OpenCV forums.
Now, for creating the mask from the contours, you can use the same drawCountours method you already use, but provide a negative value for the thickness parameters (for example, thickness=CV_FILLED). You can see a code snippet in this stackoverflow post and check details in the official documentation.
For saving the image to disk you can use imwrite.
So, in a nutshell, draw filled contours to a mask and use that mask to copy only the object pixels from the video frame to another mat that you can save the disk.
Instead of posting code, I will share this very similar question with an accepted answer that may have the code snippet you are looking for.
def apply_alpha(img, alpha_value):
print("alpha_value" + str(alpha_value))
mask_value = int(alpha_value * 255)
print("mask_value" + str(mask_value))
return img
def apply_alpha(img, alpha_value):
import copy
tmp = copy.copy(img)
print("alpha_value" + str(alpha_value))
mask_value = int(alpha_value * 255)
print("mask_value" + str(mask_value))
return tmp
working_image = apply_alpha(obs, alpha)
I tried both of the above apply_alpha functions, where "img" is a PIL image, and neither of them correctly apply alpha (nothing changes).
I am stitching together individual tiles of a composite image, and using "put alpha" to set the transparency of each individual tile. I believe the 'paste' in the merging of the individual tiles is erasing the putalpha for each individual image. How can I get this to work?
I'm using this merge_images to stitch together the individual tile images: Stitching Photos together
This scenario is distinct from other questions asked because the img.putalpha(...) is used within a function, which causes it to not work
I figured it out: the cause of the issue was that, in the merge function for the images, there is this code:
result ='RGB', (result_width, result_height))
result.paste(im=img1, box=(0, 0), mask=img1)
result.paste(im=img2, box=(width1, 0), mask=img2)
Because the image type was "RGB", the alpha channels were being ignored when composing the tiles. Make sure the image type is "RGBA"
I have a strange output in my images: all the characters are bounded with grey pixels around. I am sure at 90% that is because a OpenCV-PIL conversion issue but I don't know how to solve it.
Here is the source image:
And the output (you need to zoom to see the grey pixels..)
A detail here..
This is the code I am using:
import cv2
import tesserocr as tr
from PIL import Image
import os
src = (os.path.expanduser('~\\Desktop\\output4\\'))
causali = os.listdir(src) # CREO LISTA CAUSALI
causali.sort(key=lambda x: int(x.split('.')[0]))
for file in enumerate(causali): # CONTA NUMERO DI FILE CAUSALE
cv_img = cv2.imread(os.path.expanduser('~\\Desktop\\output4\\{}'.format(file[1])), cv2.IMREAD_UNCHANGED)
# since tesserocr accepts PIL images, converting opencv image to pil
pil_img = Image.fromarray(cv2.cvtColor(cv_img, cv2.COLOR_BGR2RGB))
# initialize api
api = tr.PyTessBaseAPI()
# set pil image for ocr
# Google tesseract-ocr has a page segmentation method(psm) option for specifying ocr types
# psm values can be: block of text, single text line, single word, single character etc.
# api.GetComponentImages method exposes this functionality
# function returns:
# image (:class:`PIL.Image`): Image object.
# bounding box (dict): dict with x, y, w, h keys.
# block id (int): textline block id (if blockids is ``True``). ``None`` otherwise.
# paragraph id (int): textline paragraph id within its block (if paraids is True).
# ``None`` otherwise.
boxes = api.GetComponentImages(tr.RIL.BLOCK, True)
# get text
text = api.GetUTF8Text()
# iterate over returned list, draw rectangles
for (im, box, _, _) in boxes:
x, y, w, h = box['x'], box['y'], box['w'], box['h']
cv_rect = cv2.rectangle(cv_img, (x-10, y-10), (x + w+10, y + h+10), color=(255, 255, 255), thickness=1)'~\\Desktop\\output5\\{}.png').format(file[0]))
Is there a way to make accept to api.SetImage() a opencv variable ?
EDIT: Is there a way to delete all grey pixels by giving their color ?
You need to use a binary thresholding algorithm to filter out the "noise" in your image.
C++ docs
Python docs
So, this is my solution. Found a way to use OpenCV instead of PIL as long as the first one don't convert the image to JPEG during the process.
We will have a clean image from input to output.
Here is the code:
import cv2
import tesserocr as tr
from PIL import Image
import os
cv_img = cv2.imread('C:\\Users\\Link\\Desktop\\0.png', cv2.IMREAD_UNCHANGED)
idx = 0
# since tesserocr accepts PIL images, converting opencv image to pil
pil_img = Image.fromarray(cv_img)
# initialize api
api = tr.PyTessBaseAPI()
# set pil image for ocr
# Google tesseract-ocr has a page segmentation method(psm) option for specifying ocr types
# psm values can be: block of text, single text line, single word, single character etc.
# api.GetComponentImages method exposes this functionality
# function returns:
# image (:class:`PIL.Image`): Image object.
# bounding box (dict): dict with x, y, w, h keys.
# block id (int): textline block id (if blockids is ``True``). ``None`` otherwise.
# paragraph id (int): textline paragraph id within its block (if paraids is True).
# ``None`` otherwise.
boxes = api.GetComponentImages(tr.RIL.TEXTLINE, True)
# get text
text = api.GetUTF8Text()
# iterate over returned list, draw rectangles
for (im, box, _, _) in boxes:
x, y, w, h = box['x'], box['y'], box['w'], box['h']
cv_rect = cv2.rectangle(cv_img, (x-10, y-10), (x + w+10, y + h+10), color=(255, 255, 255), thickness=1)
roi = cv_rect[y:y + h, x:x + w]
cv2.imwrite(os.path.expanduser('~\\Desktop\\output5\\image.png'), roi)