I've done extensive research and cannot find a combination of techniques that will achieve what I need.
I have a situation where I need to perform OCR on hundreds of W2s to extract the data for a reconciliation. The W2s are very poor quality, as they are printed and subsequently scanned back into the computer. The aforementioned process is outside of my control; unfortunately I have to work with what I've got.
I was able to successfully perform this process last year, but I had to brute force it as timeliness was a major concern. I did so by manually indicating the coordinates to extract the data from, then performing the OCR only on those segments one at a time. This year, I would like to come up with a more dynamic situation in the anticipation that the coordinates could change, format could change, etc.
I have included a sample, scrubbed W2 below. The idea is for each box on the W2 to be its own rectangle, and extract the data by iterating through all of the rectangles. I have tried several edge detection techniques but none have delivered exactly what is needed. I believe that I have not found the correct combination of pre-processing required. I have tried to mirror some of the Sudoku puzzle detection scripts.
Here is the result of what I have tried thus far, along with the python code, which can be used whether with OpenCV 2 or 3:
import cv2
import numpy as np
img = cv2.imread(image_path_here)
newx,newy = img.shape[1]/2,img.shape[0]/2
img = cv2.resize(img,(newx,newy))
blur = cv2.GaussianBlur(img, (3,3),5)
ret,thresh1 = cv2.threshold(blur,225,255,cv2.THRESH_BINARY)
gray = cv2.cvtColor(thresh1,cv2.COLOR_BGR2GRAY)
edges = cv2.Canny(gray,50,220,apertureSize = 3)
minLineLength = 20
maxLineGap = 50
lines = cv2.HoughLinesP(edges,1,np.pi/180,100,minLineLength,maxLineGap)
for x1,y1,x2,y2 in lines[0]:
cv2.line(img,(x1,y1),(x2,y2),(255,0,255),2)
cv2.imshow('hough',img)
cv2.waitKey(0)
He he, edge detection is not the only way. As the edges are thick enough (at least one pixel everywhere), binarization allows you to singulate the regions inside the boxes.
By simple criteria you can get rid of clutter, and just bounding boxes give you a fairly good segmentation.
Let me know if you don't follow anything in my code. The biggest faults of this concept are
1: (if you have noisy breaks in the main box line that would break it into separate blobs)
2: idk if this is a thing where there can be handwritten text, but having letters overlap the edges of boxes could be bad.
3: It does absolutely no orientation checking, (you may actually want to improve this as I don't think it would be too bad and would give you more accurate handles). What I mean is that it depends on your boxes being approximately aligned to the xy axes, if they are sufficiently skew, it will give you gross offsets to all your box corners (though it should still find them all)
I fiddled with the threshold set point a bit to get all the text separated from the edges, you could probably pull it even lower if necessary before you start breaking the main line. Also, if you are worried about line breaks, you could add together sufficiently large blobs into the final image.
Basically, first step fiddling with the threshold to get it in the most stable (likely lowest value that still keeps a connected box) cuttoff value for separating text and noise from box.
Second find the biggest positive blob (should be the boxgrid). If your box doesnt stay all together, you may want to take a few of the highest blobs... though that will get sticky, so try to get the threshold so that you can get it as a single blob.
Last step is to get the rectangles, to do this, I just look for negative blobs (ignoring the first outer area).
And here is the code (sorry that it is in C++, but hopefully you understand the concept and would write it yourself anyhow):
#include "opencv2/imgproc/imgproc.hpp"
#include "opencv2/highgui/highgui.hpp"
#include <iostream>
#include <stdio.h>
#include <opencv2/opencv.hpp>
using namespace cv;
//Attempts to find the largest connected group of points (assumed to be the interconnected boundaries of the textbox grid)
Mat biggestComponent(Mat targetImage, int connectivity=8)
{
Mat inputImage;
inputImage = targetImage.clone();
Mat finalImage;// = inputImage;
int greatestBlobSize=0;
std::cout<<"Top"<<std::endl;
std::cout<<inputImage.rows<<std::endl;
std::cout<<inputImage.cols<<std::endl;
for(int i=0;i<inputImage.cols;i++)
{
for(int ii=0;ii<inputImage.rows;ii++)
{
if(inputImage.at<uchar>(ii,i)!=0)
{
Mat lastImage;
lastImage = inputImage.clone();
Rect* boundbox;
int blobSize = floodFill(inputImage, cv::Point(i,ii), Scalar(0),boundbox,Scalar(200),Scalar(255),connectivity);
if(greatestBlobSize<blobSize)
{
greatestBlobSize=blobSize;
std::cout<<blobSize<<std::endl;
Mat tempDif = lastImage-inputImage;
finalImage = tempDif.clone();
}
//std::cout<<"Loop"<<std::endl;
}
}
}
return finalImage;
}
//Takes an image that only has outlines of boxes and gets handles for each textbox.
//Returns a vector of points which represent the top left corners of the text boxes.
std::vector<Rect> boxCorners(Mat processedImage, int connectivity=4)
{
std::vector<Rect> boxHandles;
Mat inputImage;
bool outerRegionFlag=true;
inputImage = processedImage.clone();
std::cout<<inputImage.rows<<std::endl;
std::cout<<inputImage.cols<<std::endl;
for(int i=0;i<inputImage.cols;i++)
{
for(int ii=0;ii<inputImage.rows;ii++)
{
if(inputImage.at<uchar>(ii,i)==0)
{
Mat lastImage;
lastImage = inputImage.clone();
Rect boundBox;
if(outerRegionFlag) //This is to floodfill the outer zone of the page
{
outerRegionFlag=false;
floodFill(inputImage, cv::Point(i,ii), Scalar(255),&boundBox,Scalar(0),Scalar(50),connectivity);
}
else
{
floodFill(inputImage, cv::Point(i,ii), Scalar(255),&boundBox,Scalar(0),Scalar(50),connectivity);
boxHandles.push_back(boundBox);
}
}
}
}
return boxHandles;
}
Mat drawTestBoxes(Mat originalImage, std::vector<Rect> boxes)
{
Mat outImage;
outImage = originalImage.clone();
outImage = outImage*0; //really I am just being lazy, this should just be initialized with dimensions
for(int i=0;i<boxes.size();i++)
{
rectangle(outImage,boxes[i],Scalar(255));
}
return outImage;
}
int main() {
Mat image;
Mat thresholded;
Mat processed;
image = imread( "Images/W2.png", 1 );
Mat channel[3];
split(image, channel);
threshold(channel[0],thresholded,150,255,1);
std::cout<<"Coputing biggest object"<<std::endl;
processed = biggestComponent(thresholded);
std::vector<Rect> textBoxes = boxCorners(processed);
Mat finalBoxes = drawTestBoxes(image,textBoxes);
namedWindow("Original", WINDOW_AUTOSIZE );
imshow("Original", channel[0]);
namedWindow("Thresholded", WINDOW_AUTOSIZE );
imshow("Thresholded", thresholded);
namedWindow("Processed", WINDOW_AUTOSIZE );
imshow("Processed", processed);
namedWindow("Boxes", WINDOW_AUTOSIZE );
imshow("Boxes", finalBoxes);
std::cout<<"waiting for user input"<<std::endl;
waitKey(0);
return 0;
}
Related
I am trying to translate the green screen sample (https://github.com/microsoft/Azure-Kinect-Sensor-SDK/blob/develop/examples/green_screen/main.cpp) from c++ to python, however, there is one part I cannot figure out.
This is the C++ code from the sample:
k4a::image main_color_image = captures[0].get_color_image();
k4a::image main_depth_image = captures[0].get_depth_image();
// let's green screen out things that are far away.
// first: let's get the main depth image into the color camera space
k4a::image main_depth_in_main_color = create_depth_image_like(main_color_image);
main_depth_to_main_color.depth_image_to_color_camera(main_depth_image, &main_depth_in_main_color);
cv::Mat cv_main_depth_in_main_color = depth_to_opencv(main_depth_in_main_color);
cv::Mat cv_main_color_image = color_to_opencv(main_color_image);
// single-camera case
cv::Mat within_threshold_range = (cv_main_depth_in_main_color != 0) &
(cv_main_depth_in_main_color < depth_threshold);
// show the close details
cv_main_color_image.copyTo(output_image, within_threshold_range);
// hide the rest with the background image
background_image.copyTo(output_image, ~within_threshold_range);
cv::imshow("Green Screen", output_image);
In Python, using pyk4a, I have translated it to:
capture = k4a.get_capture()
if(np.any(capture.depth) and np.any(capture.color)):
color = capture.color
depth_in_color_camera_space = capture.transformed_depth
depth_in_color_camera_space = cv2.normalize(depth_in_color_camera_space, None, 0, 255, cv2.NORM_MINMAX, dtype=cv2.CV_8U)
However, now I am stuck on creating the mask. I have tried multiple things with cv copyTo , threshold and bitwise_and, but none worked for me.
To help understand the C++ and python code better:
create_depth_image_like():
This is a method provided by the sample that creates a k4a::image. This is not needed for python because it just provides an array.
static k4a::image create_depth_image_like(const k4a::image &im)
{
return k4a::image::create(K4A_IMAGE_FORMAT_DEPTH16,
im.get_width_pixels(),
im.get_height_pixels(),
im.get_width_pixels() * static_cast<int>(sizeof(uint16_t)));
}
depth_image_to_color_camera():
This method is provided by the Azure Kinect SDK and is available in python as capture.transformed_depth. It transforms the depth image to be in the same image space as the color image or vice versa.
depth_to_opencv():
I am not sure if I need to use this method. Also, I don't really know what it does to be honest.
static cv::Mat depth_to_opencv(const k4a::image &im)
{
return cv::Mat(im.get_height_pixels(),
im.get_width_pixels(),
CV_16U,
(void *)im.get_buffer(),
static_cast<size_t>(im.get_stride_bytes()));
}
color_to_opencv():
Same goes for this one. I think I don't need them because python returns an array already, however, I am not sure.
static cv::Mat color_to_opencv(const k4a::image &im)
{
cv::Mat cv_image_with_alpha(im.get_height_pixels(), im.get_width_pixels(), CV_8UC4, (void *)im.get_buffer());
cv::Mat cv_image_no_alpha;
cv::cvtColor(cv_image_with_alpha, cv_image_no_alpha, cv::COLOR_BGRA2BGR);
return cv_image_no_alpha;
}
So, what's left for me to translate to python is the mask (defined in the C++ code as within_threshold_range. However, this is the part I just can't figure out. Any help would be greatly appreciated!
I'm working to detect cells within microscope images like the one below. There are often spurious contours that get drawn due to imperfections on the microscope slides, like the one below the legend in the figure below.
I'm currently using this solution to clean these up. Here's the basic idea.
# Create image of background
blank = np.zeros(image.shape[0:2])
background_image = cv2.drawContours(blank.copy(), background_contour, 0, 1, -1)
for i, c in enumerate(contours):
# Create image of contour
contour_image = cv2.drawContours(blank.copy(), contours, i, 1, -1)
# Create image of focal contour + background
total_image = np.where(background_image+contour_image>0, 1, 0)
# Check if contour is outside postive space
if total_image.sum() > background_image.sum():
continue
This works as expected; if the total_image area is greater than the area of the background_image then c must be outside the region of interest. But drawing all of these contours is incredibly slow and checking thousands of contours takes hours. Is there a more efficient way to check if contours overlap that doesn't require drawing the contours?
I assume the goal is to exclude the external contour from further analysis? If so, the easiest is to use the red background contour as a mask. Then use the masked image to detect the blue cells.
# Create image of background
blank = np.zeros(image.shape[0:2], dtype=np.uint8)
background_image = cv2.drawContours(blank.copy(), background_contour, 0, (255), -1)
# mask input image (leaves only the area inside the red background contour)
res = cv2.bitwise_and(image,image,mask=background_image )
#[detect blue cells]
assuming you are trying to find points on the different contours that are overlaping
consider contour as
vector<vector<Point> > contours;
..... //obtain you contrours.
vector<Point> non_repeating_points;
for(int i=0;i<contours.size();i++)
{
for(int j=0;j<contours[i].size();j++)
{
Point this_point= countour[i][j];
for(int k=0;k<non_repeating_points.size();k++)
{//check this list for previous record
if(non_repeating_points[k] == this_point)
{
std::cout<< "found repeat points at "<< std::endl;
std::cout<< this_point << std::endl;
break;
}
}
//if not seen before just add it in the list
non_repeating_points.push_back(this_point);
}
}
I just wrote it without compile. but I think you can understand the idea.
the information you provide is not enough.
In case you mean to find the nearest connected boundary. And there is no overlapping.
you can declare a local cluster near the point non_repeating_points[k]. Call it surround_non_repeating_points[k];
you can control the distance that can be considered as intercept and push all of them in this surround_non_repeating_points[k];
Then just check in a loop for
if(surround_non_repeating_points[k] == this_point)
I need to draw "soft" white circles (translucent borders) onto an image with OpenCV, but all I can find in the docs is how to draw 100% opaque circles with hard borders. Does anyone know how I could do this, or at least create the illusion that the circles "fade out" at the edges?
I felt like working on my OpenCV skills a bit - and learned quite a lot - cool question!
I generated a single channel image of alpha values - float to get fewer rounding errors, and single channel to save some memory. This represents how much of your circle is visible over the background.
The circle has an outer radius - the point at which it becomes fully transparent and an inner radius, the point where it stops being fully opaque. Radii between these two will be faded. So, set the IRADIUS very close to the ORADIUS for a steep, rapid falloff and set it a long way away for a slower tapering out.
I used an ROI to position the circle on the background and to speed things up by only iterating over the necessary rectangle of the background.
The only tricky part is alpha blending or compositing. You just have to know the formula for each pixel in the output image is:
out = (alpha * foreground) + (1-alpha) * background
Here is the code. I am not the world's best at OpenCV so there may be parts that can be optimised!
////////////////////////////////////////////////////////////////////////////////
// main.cpp
// Mark Setchell
////////////////////////////////////////////////////////////////////////////////
#include <opencv2/opencv.hpp>
#include <vector>
#include <cstdlib>
using namespace std;
using namespace cv;
#define ORADIUS 100 // Outer radius
#define IRADIUS 80 // Inner radius
int main()
{
// Create a blue background image
Mat3b background(400,600,Vec3b(255,0,0));
// Create alpha layer for our circle normalised to 1=>solid, 0=>transparent
Mat alpha(2*ORADIUS,2*ORADIUS,CV_32FC1);
// Now draw a circle in the alpha channel
for(auto r=0;r<alpha.rows;r++){
for(auto c=0;c<alpha.cols;c++){
int x=ORADIUS-r;
int y=ORADIUS-c;
float radius=hypot((float)x,(float)y);
auto& pixel = alpha.at<float>(r,c);
if(radius>ORADIUS){ pixel=0.0; continue;} // transparent
if(radius<IRADIUS){ pixel=1.0; continue;} // solid
pixel=1-((radius-IRADIUS)/(ORADIUS-IRADIUS)); // partial
}
}
// Create solid magenta rectangle for circle
Mat3b circle(2*ORADIUS,2*ORADIUS,Vec3b(255,0,255));
#define XPOS 20
#define YPOS 120
// Make an ROI on background where we are going to place circle
Rect ROIRect(XPOS,YPOS,ORADIUS*2,ORADIUS*2);
Mat ROI(background,ROIRect);
// Do the alpha blending thing
Vec3b *thisBgRow;
Vec3b *thisFgRow;
float *thisAlphaRow;
for(int j=0;j<ROI.rows;++j)
{
thisBgRow = ROI.ptr<Vec3b>(j);
thisFgRow = circle.ptr<Vec3b>(j);
thisAlphaRow = alpha.ptr<float>(j);
for(int i=0;i<ROI.cols;++i)
{
for(int c=0;c<3;c++){ // iterate over channels, result=circle*alpha + (1-alpha)*background
thisBgRow[i][c] = saturate_cast<uchar>((thisFgRow[i][c]*thisAlphaRow[i]) + ((1.0-thisAlphaRow[i])*thisBgRow[i][c]));
}
}
}
imwrite("result.png",background);
return 0;
}
This is with IRADIUS=80:
This is with IRADIUS=30:
Kudos and thanks to #Micka for sharing his code for iterating over a ROI here.
Oooops, I just realised you were looking for a Python solution. Hopefully my code will give you some ideas for generating the soft circle mask, and I found an article here that shows you some Python-style ways of doing it that you can mash up with my code.
I am an undergraduate student. I am new to image processing and python.
I have many images of plants samples and their description(called labels which are stuck on the sample) as shown in the below Figure. I need to Automatically segment only those labels from the sample.
I tried thresholding based on colour, but it failed. Could you please suggest me an example to do this task. I need some ideas or codes to make it completely automatic segmentation.
Please help me if you are experts in image processing and Python, I need your help to complete this task.
The rectangle is detected on the Top Left, but it should be on bottom right. Could you please tell me where is my mistake and how to correct it.
I have also given the code below.
You can try a template matching with a big white rectangle to identify the area where information is stored.
http://docs.opencv.org/3.1.0/d4/dc6/tutorial_py_template_matching.html#gsc.tab=0
When it will be done, you will be able to recognize characters in this area... You save a small subimage, and with a tool like pytesseract you will be able to read characters.
https://pypi.python.org/pypi/pytesseract
You have other OCR here with some examples :
https://saxenarajat99.wordpress.com/2014/10/04/optical-character-recognition-in-python/
Good luck !
Why using color threshold? I tried this one with ImageJ and get nice results. I just converted the image to 8bit and binarise using a fixed threshold (166 in this case). You can choose the best threshold from the image histogram.
Then you just need to find your white rectangle region and read the characters like FrsECM suggested.
Here's an example in c++:
#include "opencv2/imgproc.hpp"
#include "opencv2/highgui.hpp"
#include <stdlib.h>
#include <stdio.h>
using namespace cv;
/// Global variables
int threshold_nvalue = 166;
const int thresh_increment = 2;
int threshold_type = THRESH_BINARY;//1
int const max_value = 255;
int const morph_size = 3;
int const min_blob_size = 1000;
Mat src, src_resized, src_gray, src_thresh, src_morph;
/**
* #function main
*/
int main(int argc, char** argv)
{
/// Load an image
src = imread("C:\\Users\\phili\\Pictures\\blatt.jpg", 1);
//Resize for displaying it properly
resize(src, src_resized, Size(600, 968));
/// Convert the image to Gray
cvtColor(src_resized, src_gray, COLOR_RGB2GRAY);
/// Region of interest
Rect label_rect;
//Binarization sing fixed threshold
threshold(src_gray,src_thresh, thres, max_value, threshold_type);
//Erase small object using morphologie
Mat element = getStructuringElement(0, Size(2 * morph_size + 1, 2 * morph_size + 1), Point(morph_size, morph_size));
morphologyEx(src_thresh, src_morph, MORPH_CLOSE, element);
//find white objects and their contours
std::vector<std::vector<Point> > contours;
std::vector<Vec4i> hierarchy;
findContours(src_morph, contours, CV_RETR_TREE, CV_CHAIN_APPROX_NONE, Point(0, 0));
for (std::vector<std::vector<Point> >::iterator it = contours.begin(); it != contours.end(); ++it)
{
//just big blobs
if (it->size()>min_blob_size)
{
//approx contour and check for rectangle
std::vector<Point> approx;
approxPolyDP(*it, approx, 0.01*arcLength(*it, true), true);
if (approx.size() == 4)
{
//just for visualization
drawContours(src_resized, approx, 0, Scalar(0, 255, 255),-1);
//bounding rect for ROI
label_rect = boundingRect(approx);
//exit loop
break;
}
}
}
//Region of interest
Mat label_roi = src_resized(label_rect);
//OCR comes here...
}
I've been trying to blend two images. The current approach I'm taking is, I obtain the coordinates of the overlapping region of the two images, and only for the overlapping regions, I blend with a hardcoded alpha of 0.5, before adding it. SO basically I'm just taking half the value of each pixel from overlapping regions of both the images, and adding them. That doesn't give me a perfect blend because the alpha value is hardcoded to 0.5. Here's the result of blending of 3 images:
As you can see, the transition from one image to another is still visible. How do I obtain the perfect alpha value that would eliminate this visible transition? Or is there no such thing, and I'm taking a wrong approach?
Here's how I'm currently doing the blending:
for i in range(3):
base_img_warp[overlap_coords[0], overlap_coords[1], i] = base_img_warp[overlap_coords[0], overlap_coords[1],i]*0.5
next_img_warp[overlap_coords[0], overlap_coords[1], i] = next_img_warp[overlap_coords[0], overlap_coords[1],i]*0.5
final_img = cv2.add(base_img_warp, next_img_warp)
If anyone would like to give it a shot, here are two warped images, and the mask of their overlapping region: http://imgur.com/a/9pOsQ
Here is the way I would do it in general:
int main(int argc, char* argv[])
{
cv::Mat input1 = cv::imread("C:/StackOverflow/Input/pano1.jpg");
cv::Mat input2 = cv::imread("C:/StackOverflow/Input/pano2.jpg");
// compute the vignetting masks. This is much easier before warping, but I will try...
// it can be precomputed, if the size and position of your ROI in the image doesnt change and can be precomputed and aligned, if you can determine the ROI for every image
// the compression artifacts make it a little bit worse here, I try to extract all the non-black regions in the images.
cv::Mat mask1;
cv::inRange(input1, cv::Vec3b(10, 10, 10), cv::Vec3b(255, 255, 255), mask1);
cv::Mat mask2;
cv::inRange(input2, cv::Vec3b(10, 10, 10), cv::Vec3b(255, 255, 255), mask2);
// now compute the distance from the ROI border:
cv::Mat dt1;
cv::distanceTransform(mask1, dt1, CV_DIST_L1, 3);
cv::Mat dt2;
cv::distanceTransform(mask2, dt2, CV_DIST_L1, 3);
// now you can use the distance values for blending directly. If the distance value is smaller this means that the value is worse (your vignetting becomes worse at the image border)
cv::Mat mosaic = cv::Mat(input1.size(), input1.type(), cv::Scalar(0, 0, 0));
for (int j = 0; j < mosaic.rows; ++j)
for (int i = 0; i < mosaic.cols; ++i)
{
float a = dt1.at<float>(j, i);
float b = dt2.at<float>(j, i);
float alpha = a / (a + b); // distances are not between 0 and 1 but this value is. The "better" a is, compared to b, the higher is alpha.
// actual blending: alpha*A + beta*B
mosaic.at<cv::Vec3b>(j, i) = alpha*input1.at<cv::Vec3b>(j, i) + (1 - alpha)* input2.at<cv::Vec3b>(j, i);
}
cv::imshow("mosaic", mosaic);
cv::waitKey(0);
return 0;
}
Basically you compute the distance from your ROI border to the center of your objects and compute the alpha from both blending mask values. So if one image has a high distance from the border and other one a low distance from border, you prefer the pixel that is closer to the image center. It would be better to normalize those values for cases where the warped images aren't of similar size.
But even better and more efficient is to precompute the blending masks and warp them. Best would be to know the vignetting of your optical system and choose and identical blending mask (typically lower values of the border).
From the previous code you'll get these results:
ROI masks:
Blending masks (just as an impression, must be float matrices instead):
image mosaic:
There are 2 obvious problems with your images:
Border area has distorted lighting conditions
That is most likely caused by the optics used to acquire images. So to remedy that you should use only inside part of the images (cut off few pixels from border.
So when cut off 20 pixels from the border and blending to common illumination I got this:
As you can see the ugly border seam is away now only the illumination problems persists (see bullet #2).
Images are taken at different lighting conditions
Here the subsurface scattering effects hits in making the images "not-compatible". You should normalize them to some uniform illumination or post process the blended result line by line and when coherent bump detected multiply the rest of line so the bump will be diminished.
So the rest of the line should be multiplied by constant i0/i1. These kind if bumps can occur only on the edges between overlap values so you can either scan for them or use those positions directly ... To recognize valid bump it should have neighbors nearby in previous and next lines along the whole image height.
You can do this also in y axis direction in the same way ...