I have many skeletonized images like this:
How can i detect a cycle, a loop in the skeleton?
Are there "special" functions that do this or should I implement it as a graph?
In case there is only the graph option, can the python graph library NetworkX can help me?
You can exploit the topology of the skeleton. A cycle will have no holes, so we can use scipy.ndimage to find any holes and compare. This isn't the fastest method, but it's extremely easy to code.
import scipy.misc, scipy.ndimage
# Read the image
img = scipy.misc.imread("Skel.png")
# Retain only the skeleton
img[img!=255] = 0
img = img.astype(bool)
# Fill the holes
img2 = scipy.ndimage.binary_fill_holes(img)
# Compare the two, an image without cycles will have no holes
print "Cycles in image: ", ~(img == img2).all()
# As a test break the cycles
img3 = img.copy()
img3[0:200, 0:200] = 0
img4 = scipy.ndimage.binary_fill_holes(img3)
# Compare the two, an image without cycles will have no holes
print "Cycles in image: ", ~(img3 == img4).all()
I've used your "B" picture as an example. The first two images are the original and the filled version which detects a cycle. In the second version, I've broken the cycle and nothing gets filled, thus the two images are the same.
First, let's build an image of the letter B with PIL:
import Image, ImageDraw, ImageFont
image = Image.new("RGBA", (600,150), (255,255,255))
draw = ImageDraw.Draw(image)
fontsize = 150
font = ImageFont.truetype("/usr/share/fonts/truetype/liberation/LiberationMono-Regular.ttf", fontsize)
txt = 'B'
draw.text((30, 5), txt, (0,0,0), font=font)
img = image.resize((188,45), Image.ANTIALIAS)
print type(img)
plt.imshow(img)
you may find a better way to do that, particularly with path to the fonts. Ii would be better to load an image instead of generating it. Anyway, we have now something to work on:
Now, the real part:
import mahotas as mh
img = np.array(img)
im = img[:,0:50,0]
im = im < 128
skel = mh.thin(im)
noholes = mh.morph.close_holes(skel)
plt.subplot(311)
plt.imshow(im)
plt.subplot(312)
plt.imshow(skel)
plt.subplot(313)
cskel = np.logical_not(skel)
choles = np.logical_not(noholes)
holes = np.logical_and(cskel,noholes)
lab, n = mh.label(holes)
print 'B has %s holes'% str(n)
plt.imshow(lab)
And we have in the console (ipython):
B has 2 holes
Converting your skeleton image to a graph representation is not trivial, and I don't know of any tools to do that for you.
One way to do it in the bitmap would be to use a flood fill, like the paint bucket in photoshop. If you start a flood fill of the image, the entire background will get filled if there are no cycles. If the fill doesn't get the entire image then you've found a cycle. Robustly finding all the cycles could require filling multiple times.
This is likely to be very slow to execute, but probably much faster to code than a technique where you trace the skeleton into graph data structure.
Related
I have an image processing problem that I can't solve. I have a set of 375 images like the one below (1). I'm trying to remove the background, so to make "background substraction" (or "foreground extraction") and get only the waste on a plain background (black/white/...).
(1) Image example
I tried many things, including createBackgroundSubtractorMOG2 from OpenCV, or threshold. I also tried to remove the background pixel by pixel by subtracting it from the foreground because I have a set of 237 background images (2) (the carpet without the waste, but which is a little bit offset from the image with the objects). There are also variations in brightness on the background images.
(2) Example of a background image
Here is a code example that I was able to test and that gives me the results below (3) and (4). I use Python 3.8.3.
# Function to remove the sides of the images
def delete_side(img, x_left, x_right):
for i in range(img.shape[0]):
for j in range(img.shape[1]):
if j<=x_left or j>=x_right:
img[i,j] = (0,0,0)
return img
# Intialize the background model
backSub = cv2.createBackgroundSubtractorMOG2(history=250, varThreshold=2, detectShadows=True)
# Read the frames and update the background model
for frame in frames:
if frame.endswith(".png"):
filepath = FRAMES_FOLDER + '/' + frame
img = cv2.imread(filepath)
img_cut = delete_side(img, x_left=190, x_right=1280)
gray = cv2.cvtColor(img_cut, cv2.COLOR_BGR2GRAY)
mask = backSub.apply(gray)
newimage = cv2.bitwise_or(img, img, mask=mask)
img_blurred = cv2.GaussianBlur(newimage, (5, 5), 0)
gray2 = cv2.cvtColor(img_blurred, cv2.COLOR_BGR2GRAY)
_, binary = cv2.threshold(gray2, 10, 255, cv2.THRESH_BINARY)
final = cv2.bitwise_or(img, img, mask=binary)
newpath = RESULT_FOLDER + '/' + frame
cv2.imwrite(newpath, final)
I was inspired by many other cases found on Stackoverflow or others (example: removing pixels less than n size(noise) in an image - open CV python).
(3) The result obtained with the code above
(4) Result when increasing the varThreshold argument to 10
Unfortunately, there is still a lot of noise on the resulting pictures.
As a beginner in "background substraction", I don't have all the keys to get an optimal solution. If someone would have an idea to do this task in a more efficient and clean way (Is there a special method to handle the case of transparent objects? Can noise on objects be eliminated more effectively? etc.), I'm interested :)
Thanks
Thanks for your answers. For information, I simply change of methodology and use a segmentation model (U-Net) with 2 labels (foreground, background), to identify the background. It works quite well.
I need to find the largest empty area in the document and display its coordinates, center point and area, using python to put a QR Code there.
I think OpenCV and Numpy should be enough for this task.
What kinda THRESH to use? Because there are a lot of types of scans:
gray, BW, with color, and how to find the contour properly?
How this can be implemented in the fastest way? An example using the
first scan from google is attached, where you can see that the code
should find the largest empty square area.
#Mark Setchell Thanks! This code works perfectly for all docs with a white background, but when I use smth with a color in the background it finds a completely different area. Also, to keep thin lines in the docs I used Erode after thresholding. Tried to change thresholding and erode parameters, still not working properly.
Edited post, added color pictures.
Here's a possible approach:
#!/usr/bin/env python3
import cv2
import numpy as np
def largestSquare(im):
# Make image square of 100x100 to simplify and speed up
s = 100
work = cv2.resize(im, (s,s), interpolation=cv2.INTER_NEAREST)
# Make output accumulator - uint16 is ok because...
# ... max value is 100x100, i.e. 10,000 which is less than 65,535
# ... and you can make a PNG of it too
p = np.zeros((s,s), np.uint16)
# Find largest square
for i in range(1, s):
for j in range(1, s):
if (work[i][j] > 0 ):
p[i][j] = min(p[i][j-1], p[i-1][j], p[i-1][j-1]) + 1
else:
p[i][j] = 0
# Save result - just for illustration purposes
cv2.imwrite("result.png",p)
# Work out what the actual answer is
ind = np.unravel_index(np.argmax(p, axis=None), p.shape)
print(f'Location: {ind}')
print(f'Length of side: {p[ind]}')
# Load image and threshold
im = cv2.imread('page.png', cv2.IMREAD_GRAYSCALE)
_, thr = cv2.threshold(im,127,255,cv2.THRESH_BINARY | cv2.THRESH_OTSU)
# Get largest white square
largestSquare(thr)
Output
Location: (21, 77)
Length of side: 18
Notes:
I edited out your red annotation so it didn't interfere with my algorithm.
I did Otsu thresholding to get pure black and white - that may or may not be appropriate to your use case. It will depend on your scans and paper background etc.
I scaled the image down to 100x100 so it doesn't take all day to run. You will need to scale the results back up to the size of your original image but I assume you can do that easily enough.
Keywords: Image processing, image, Python, OpenCV, largest white square, largest empty space.
I've been trying to convert stereo images into a depth map with use of opencv, but not matter what I do it seems to come out unreadable.
I was able to get an accurate depth image of example images that were provided in the opencv tutorial but not on any other image. Even when I attempted to download other premade, calibrated stereo image from online I get terrible results that are neither accurate nor are even close to quality that I get with the example images.
here is my main python script that I use to make the depth map:
import numpy as np
import cv2
from matplotlib import pyplot as plt
imgL = cv2.imread('calimg_L.png',0)
imgR = cv2.imread('calimg_R.png',0)
# imgL = cv2.imread('./images/example_L.png',0)
# imgR = cv2.imread('./images/example_R.png',0)
stereo = cv2.StereoSGBM_create(numDisparities=16, blockSize=15)
disparity = stereo.compute(imgR,imgL)
norm_image = cv2.normalize(disparity, None, alpha = 0, beta = 1, norm_type=cv2.NORM_MINMAX, dtype=cv2.CV_32F)
cv2.imwrite("disparityImage.jpg", norm_image)
plt.imshow(norm_image)
plt.show()
where calimg_L.png is a calibrated version of the original image.
Here is the code I use to calibrate my images:
import numpy as np
import cv2
import glob
from matplotlib import pyplot as plt
def createCalibratedImage(inputImage, outputName):
# termination criteria
criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 30, 0.001)
# prepare object points, like (0,0,0), (1,0,0), (2,0,0) ....,(6,5,0)
objp = np.zeros((3*3,3), np.float32)
objp[:,:2] = np.mgrid[0:3,0:3].T.reshape(-1,2)
# Arrays to store object points and image points from all the images.
objpoints = [] # 3d point in real world space
imgpoints = [] # 2d points in image plane.
# org = cv2.imread('./chess.jpg')
# orig_cal_img = cv2.resize(org, (384, 288))
# cv2.imwrite("cal_chess.jpg", orig_cal_img)
images = glob.glob('./chess_webcam/*.jpg')
for fname in images:
print('file in use: ' + fname)
img = cv2.imread(fname)
gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
# Find the chess board corners
ret, corners = cv2.findChessboardCorners(gray, (3,3),None)
# print("doing the thing");
print('status: ' + str(ret));
# If found, add object points, image points (after refining them)
if ret == True:
# print("found something");
objpoints.append(objp)
cv2.cornerSubPix(gray,corners,(11,11),(-1,-1),criteria)
imgpoints.append(corners)
# Draw and display the corners
cv2.drawChessboardCorners(img, (3,3), corners,ret)
cv2.imshow('img',img)
cv2.waitKey(500)
ret, mtx, dist, rvecs, tvecs = cv2.calibrateCamera(objpoints, imgpoints, gray.shape[::-1],None,None)
img = inputImage
h, w = img.shape[:2]
newcameramtx, roi=cv2.getOptimalNewCameraMatrix(mtx,dist,(w,h),1,(w,h))
# undistort
print('undistorting...')
mapx,mapy = cv2.initUndistortRectifyMap(mtx,dist,None,newcameramtx,(w,h),5)
dst = cv2.remap(inputImage ,mapx,mapy,cv2.INTER_LINEAR)
# crop the image
x,y,w,h = roi
dst = dst[y:y+h, x:x+w]
# cv2.imwrite('calibresult.png',dst)
cv2.imwrite(outputName + '.png',dst)
cv2.destroyAllWindows()
original_L = cv2.imread('capture_L.jpg')
original_R = cv2.imread('capture_R.jpg')
createCalibratedImage(original_R, "calimg_R")
createCalibratedImage(original_L, "calimg_L")
print("images calibrated and outputed")
This code was taken from opencv tutorial on how to calibrate images and was provided at least 16 images of the chess board, but was only able to identify the chessboard in about 4 - 5 of them. The reason I used such a relatively small grid search of 3x3 is because anything higher left me without any images to use for calibration due to its inability to find the chessboard.
Here is what I get from an example image(sorry for weird link, couldn't find how to upload):
https://ibb.co/DYMcdZc
here is the original:
https://ibb.co/gMkqyXD
https://ibb.co/YQZY40C
This acts a it should, but when I use it with any other image it gives me a mess, for example:
output:
https://ibb.co/kXwgDVn
looks like just a mess of pixels, to be fair when you put it into 'gray' on imshow it looks more readable but it is not very representative of the image's depth, here are the originals:
https://ibb.co/vqDKGS0
https://ibb.co/f0X1gMB
Even worse so, when I take images myself and do calibrate them through the chessboard code, it comes out as just a random mess of white and black pixels, and values of some goes into negatives and some pixels are impossibly high value.
tl;dr I can't get any stereo images to be made into a depth map even though the example image works just fine, why is that?
First I want to say that obtaining a good depth map is not such a simple task, and using the basic StereoMatching won't always lead to good results. Nevertheless, something better can be achieved.
In order:
Calibration: you should be able to find the checkerboard in more images, 4/5 is a very low number for calibration, it is very hard to estimate correctly the camera parameters with such low number. How do the images look like? Did you read them as grayscale images? Usually also using a different number for row and column (not 3x3 grid, like 4x3) helps to understand the checkerboard position (otherwise it could be ambiguous which side is up or right, for example, a 90 rotation would result in 0 rotation).
Rectification: this can be easily checked by looking at the images. Open two images on two different layers (using GIMP or similar) and check for similar points. After you rectified the images, they should lie on the same line. Are they really on the same line? If yes, rectification work, otherwise, you need a better calibration. The stereo matching won't work without this step.
Stereo Matching: if all above steps are correct, then you may have a problem on the parameters of the stereo matching. First thing to check is disparity range (since it looks like you have different resolution between example images and your images, you should check and adapt that value). Min disparity can also help (if you reduce the disparity range, you reduce the error possibilities) and also block size (15 is quite big, smaller is also enough).
From what you say, my guess would be the problem is on the calibration. You should try to check the rectified images, and if the problem is there try to acquire a new dataset (or find online a better one) and calibrate your images there. Once you can calibrate and rectify your images correctly, you should get better results.
I see the code is similar to the tutorial here so I guess that's correct and the main problem are the images. Hope this can help,I can help you more if you test and see where the probelm is!
I'm trying to remove the background from product images, save them as transparent png's and got to a point where I can't figure out how and why I get the white line around the products like a fuzziness(see second image) don't know the real word for the effect. Also I'm losing the Nike swoosh which is white too :(
from PIL import Image
img = Image.open('test.jpg')
img = img.convert("RGBA")
datas = img.getdata()
newData = []
for item in datas:
if item[0] > 247 and item[1] > 247 and item[2] > 247:
newData.append((255, 255, 255, 0))
else:
newData.append(item)
img.putdata(newData)
img.save("test.png", "PNG")
Any ideas how I can fix this so I get clean selections, edges ?
Take a copy of your image and use PIL/Pillow's ImageDraw.floodfill() to flood fill from the top-left corner using a reasonable tolerance - that way you will only fill to the edges of the shirt and avoid the Nike logo.
Then take the background outline and make it white and everything else black and try applying some morphology (from scikit-image maybe) to dilate the white a little larger to hide the jaggies.
Finally, put the resulting new layer into the image with putalpha().
I am really pushed for time, but here are the bones of it. Just missing the copy of the original image at the start and the putalpha() of the new alpha layer back at the end...
from PIL import Image, ImageDraw
import numpy as np
import skimage.morphology
# Open the shirt
im = Image.open('shirt.jpg')
# Make all background pixels (not including Nike logo) into magenta (255,0,255)
ImageDraw.floodfill(im,xy=(0,0),value=(255,0,255),thresh=10)
# DEBUG
im.show()
Experiment with the threshold (thresh) here. If you make it 50, it works much more cleanly and may be good enough to stop.
# Make into Numpy array
n = np.array(im)
# Mask of magenta background pixels
bgMask =(n[:, :, 0:3] == [255,0,255]).all(2)
# DEBUG
Image.fromarray((bgMask*255).astype(np.uint8)).show()
# Make a disk-shaped structuring element
strel = skimage.morphology.disk(13)
# Perform a morphological closing with structuring element
closed = skimage.morphology.binary_closing(bgMask,selem=strel)
# DEBUG
Image.fromarray((closed*255).astype(np.uint8)).show()
If you are unfamiliar with morphology, Anthony Thyssen has some excellent noes worth reading here.
By the way, you could also use potrace to smooth the outline somewhat.
I had a bit more time today so here is a more complete version. You can experiment with the morphology disk sizes and floodfill thresholds according to your images till you find something tailored for your needs:
#!/bin/env python3
from PIL import Image, ImageDraw
import numpy as np
import skimage.morphology
# Open the shirt and make a clean copy before we dink with it too much
im = Image.open('shirt.jpg')
orig = im.copy()
# Make all background pixels (not including Nike logo) into magenta (255,0,255)
ImageDraw.floodfill(im,xy=(0,0),value=(255,0,255),thresh=50)
# DEBUG
im.show()
# Make into Numpy array
n = np.array(im)
# Mask of magenta background pixels
bgMask =(n[:, :, 0:3] == [255,0,255]).all(2)
# DEBUG
Image.fromarray((bgMask*255).astype(np.uint8)).show()
# Make a disk-shaped structuring element
strel = skimage.morphology.disk(13)
# Perform a morphological closing with structuring element to remove blobs
newalpha = skimage.morphology.binary_closing(bgMask,selem=strel)
# Perform a morphological dilation to expand mask right to edges of shirt
newalpha = skimage.morphology.binary_dilation(newalpha, selem=strel)
# Make a PIL representation of newalpha, converting from True/False to 0/255
newalphaPIL = (newalpha*255).astype(np.uint8)
newalphaPIL = Image.fromarray(255-newalphaPIL, mode='L')
# DEBUG
newalphaPIL.show()
# Put new, cleaned up image into alpha layer of original image
orig.putalpha(newalphaPIL)
orig.save('result.png')
As regards using potrace to smooth the outline, you would save new alphaPIL as a PGM format image because that is what potrace likes as input. So that would be:
newalphaPIL.save('newalpha.pgm')
Now you can play around, oops I meant "experiment carefully" with potrace to smooth the alpha outline. The basic command is:
potrace -b pgm newalpha.pgm -o smoothalpha.pgm
You can then re-load the image smoothalpha.pgm back into your Python and use it on the last line in the putalpha() call. Here is an animation of the difference between the original unsmoothed alpha and the smoothed one:
Look carefully at the edges to see the difference. You may want to experiment with resizing the alpha either to twice the size or half the size before smoothing to see what effect that has.
I have some hundreds of images (scanned documents), most of them are skewed. I wanted to de-skew them using Python.
Here is the code I used:
import numpy as np
import cv2
from skimage.transform import radon
filename = 'path_to_filename'
# Load file, converting to grayscale
img = cv2.imread(filename)
I = cv2.cvtColor(img, COLOR_BGR2GRAY)
h, w = I.shape
# If the resolution is high, resize the image to reduce processing time.
if (w > 640):
I = cv2.resize(I, (640, int((h / w) * 640)))
I = I - np.mean(I) # Demean; make the brightness extend above and below zero
# Do the radon transform
sinogram = radon(I)
# Find the RMS value of each row and find "busiest" rotation,
# where the transform is lined up perfectly with the alternating dark
# text and white lines
r = np.array([np.sqrt(np.mean(np.abs(line) ** 2)) for line in sinogram.transpose()])
rotation = np.argmax(r)
print('Rotation: {:.2f} degrees'.format(90 - rotation))
# Rotate and save with the original resolution
M = cv2.getRotationMatrix2D((w/2,h/2),90 - rotation,1)
dst = cv2.warpAffine(img,M,(w,h))
cv2.imwrite('rotated.jpg', dst)
This code works well with most of the documents, except with some angles: (180 and 0) and (90 and 270) are often detected as the same angle (i.e it does not make difference between (180 and 0) and (90 and 270)). So I get a lot of upside-down documents.
Here is an example:
The resulted image that I get is the same as the input image.
Is there any suggestion to detect if an image is upside down using Opencv and Python?
PS: I tried to check the orientation using EXIF data, but it didn't lead to any solution.
EDIT:
It is possible to detect the orientation using Tesseract (pytesseract for Python), but it is only possible when the image contains a lot of characters.
For anyone who may need this:
import cv2
import pytesseract
print(pytesseract.image_to_osd(cv2.imread(file_name)))
If the document contains enough characters, it is possible for Tesseract to detect the orientation. However, when the image has few lines, the orientation angle suggested by Tesseract is usually wrong. So this can not be a 100% solution.
Python3/OpenCV4 script to align scanned documents.
Rotate the document and sum the rows. When the document has 0 and 180 degrees of rotation, there will be a lot of black pixels in the image:
Use a score keeping method. Score each image for it's likeness to a zebra pattern. The image with the best score has the correct rotation. The image you linked to was off by 0.5 degrees. I omitted some functions for readability, the full code can be found here.
# Rotate the image around in a circle
angle = 0
while angle <= 360:
# Rotate the source image
img = rotate(src, angle)
# Crop the center 1/3rd of the image (roi is filled with text)
h,w = img.shape
buffer = min(h, w) - int(min(h,w)/1.15)
roi = img[int(h/2-buffer):int(h/2+buffer), int(w/2-buffer):int(w/2+buffer)]
# Create background to draw transform on
bg = np.zeros((buffer*2, buffer*2), np.uint8)
# Compute the sums of the rows
row_sums = sum_rows(roi)
# High score --> Zebra stripes
score = np.count_nonzero(row_sums)
scores.append(score)
# Image has best rotation
if score <= min(scores):
# Save the rotatied image
print('found optimal rotation')
best_rotation = img.copy()
k = display_data(roi, row_sums, buffer)
if k == 27: break
# Increment angle and try again
angle += .75
cv2.destroyAllWindows()
How to tell if the document is upside down? Fill in the area from the top of the document to the first non-black pixel in the image. Measure the area in yellow. The image that has the smallest area will be the one that is right-side-up:
# Find the area from the top of page to top of image
_, bg = area_to_top_of_text(best_rotation.copy())
right_side_up = sum(sum(bg))
# Flip image and try again
best_rotation_flipped = rotate(best_rotation, 180)
_, bg = area_to_top_of_text(best_rotation_flipped.copy())
upside_down = sum(sum(bg))
# Check which area is larger
if right_side_up < upside_down: aligned_image = best_rotation
else: aligned_image = best_rotation_flipped
# Save aligned image
cv2.imwrite('/home/stephen/Desktop/best_rotation.png', 255-aligned_image)
cv2.destroyAllWindows()
Assuming you did run the angle-correction already on the image, you can try the following to find out if it is flipped:
Project the corrected image to the y-axis, so that you get a 'peak' for each line. Important: There are actually almost always two sub-peaks!
Smooth this projection by convolving with a gaussian in order to get rid of fine structure, noise, etc.
For each peak, check if the stronger sub-peak is on top or at the bottom.
Calculate the fraction of peaks that have sub-peaks on the bottom side. This is your scalar value that gives you the confidence that the image is oriented correctly.
The peak finding in step 3 is done by finding sections with above average values. The sub-peaks are then found via argmax.
Here's a figure to illustrate the approach; A few lines of you example image
Blue: Original projection
Orange: smoothed projection
Horizontal line: average of the smoothed projection for the whole image.
here's some code that does this:
import cv2
import numpy as np
# load image, convert to grayscale, threshold it at 127 and invert.
page = cv2.imread('Page.jpg')
page = cv2.cvtColor(page, cv2.COLOR_BGR2GRAY)
page = cv2.threshold(page, 127, 255, cv2.THRESH_BINARY_INV)[1]
# project the page to the side and smooth it with a gaussian
projection = np.sum(page, 1)
gaussian_filter = np.exp(-(np.arange(-3, 3, 0.1)**2))
gaussian_filter /= np.sum(gaussian_filter)
smooth = np.convolve(projection, gaussian_filter)
# find the pixel values where we expect lines to start and end
mask = smooth > np.average(smooth)
edges = np.convolve(mask, [1, -1])
line_starts = np.where(edges == 1)[0]
line_endings = np.where(edges == -1)[0]
# count lines with peaks on the lower side
lower_peaks = 0
for start, end in zip(line_starts, line_endings):
line = smooth[start:end]
if np.argmax(line) < len(line)/2:
lower_peaks += 1
print(lower_peaks / len(line_starts))
this prints 0.125 for the given image, so this is not oriented correctly and must be flipped.
Note that this approach might break badly if there are images or anything not organized in lines in the image (maybe math or pictures). Another problem would be too few lines, resulting in bad statistics.
Also different fonts might result in different distributions. You can try this on a few images and see if the approach works. I don't have enough data.
You can use the Alyn module. To install it:
pip install alyn
Then to use it to deskew images(Taken from the homepage):
from alyn import Deskew
d = Deskew(
input_file='path_to_file',
display_image='preview the image on screen',
output_file='path_for_deskewed image',
r_angle='offest_angle_in_degrees_to_control_orientation')`
d.run()
Note that Alyn is only for deskewing text.