I am trying to find the frequency at which certain words appear in different books using python. For this purpose I have attempted to find the bounding box around each word.
the input:- https://www.dropbox.com/s/ib74y9wh2vrxlwi/textin.jpg
and the output that I get after performing binarisation and other morphological operations for detecting the bounding boxes:-
https://www.dropbox.com/s/9q4x61dyvstu5ub/textout.png
My question is,
I need to perform ocr using pytesser. My current implementation is quite dirty. I am currently saving each of the bounding box detected into small png files .Then run the code for pytesser separately which loops through each of these small images containing words. This process hogs my system.
Is there some other way round to feed my images(detected by bounding boxes) directly into pytesser without first saving them?
After my code is run, I have a list of 544(here in this example) bounding Boxes like
[minrow, mincol, maxrow, maxcol].
Related
I am trying to implement an image recognition program and I need to remove (or "crop") all text, present on the image, so for example from that:
to that:
I already tried the Keras OCR method, but firstly I don't need the background blur I simply need to delete the text, and secondly it takes a lot of time and CPU power. Is there an easier way to detect those text regions and simply crop them out of the picture?
One way is to detect the text with findContours - the ones with an area < threshold are letters, then paint over these areas, or/and first find their bounding rectangle and paint a big one.
Text Extraction from image after detecting text region with contours
There is also pytesseract to detect letters and their region, but I guess it will be heavier than the contours.
Here is an example project where I worked with pytesseract: How to obtain the best result from pytesseract?
Say I have an image of a book page (something similar to what is pictured below) and want to generate a bounding box for the central image (outlined in green). How might I do this with python? I've tried the normal edge detection route but have found it to be too slow and that it picks up too many edges within the actual image of interest. Meanwhile libraries like detecto attempt to look for objects within the images rather than just detect some rectangular image. I have about 100 of these that I'd like to process and generate bounding boxes for.
100 is too few for me too want to train any kind of AI model, but too many to just do manually. Any thoughts on an approach?
i want to detect the font of text in an image so that i can do better OCR on it. searching for a solution i found this post. although it may seem that it is the same as my question, it does not exactly address my problem.
background
for OCR i am using tesseract, which uses trained data for recognizing text. training tesseract with lots of fonts reduces the accuracy which is natural and understandable. one solution is to build multiple trained data - one per few similar fonts - and then automatically use the appropriate data for each image. for this to work we need to be able to detect the font in image.
number 3 in this answer uses OCR to isolate image of characters along with their recognized character and then generates the same character's image with each font and compare them with the isolated image. in my case the user should provide a bounding box and the character associated with it. but because i want to OCR Arabic script(which is cursive and character shapes may vary depending on what other characters are adjacent to it) and because the bounding box may not be actually the minimal bounding box, i am not sure how i can do the comparing.
i believe Hausdorff distance is not applicable here. am i right?
shape context may be good(?) and there is a shapeContextDistanceExtractor class in opencv but i am not sure how i can use it in opencv-python
thank you
sorry for bad English
I am trying to determine if there are pills in different cells of a pillbox. I have a raspberry pi camera that takes pictures of a pillbox at regular intervals. I would like to use computer vision using opencv and python to determine which of the 14 cells have pills in them.
I have gotten as far as isolating the borders of the cells. How do I create opencv masks for the interiors of the cells?
I tried running hough transform on the processed image, but it does not accurately find lines corresponding to each wall
Original image:
After Processing with cell walls isolated:
You can try Hough Line detection one the image to detect line segments, and use the information to either close the gaps in your current result or simply detect the cells using only the line segments.
The advantage of using line detections is that once you have the parameters of the line, you can extend it below what actually appears in the image. So, even if a cell has a gap, you should be able to close it.
I am analysing a greyscale recording of synapses, from which I would like to automatically extract regions of interests (ROIs) as sets of small 'cuts' of the whole animation in order to be able to both trace and account for the movement of the microscope and to profile the Z-axis profile of a particular ROI. This means that I need to scan through the image, identify ROIs and match them 'over the frames', exporting the result as set of frames. Common ROI catching techniques (filtering, averaging over frames via Markov or Fourier and then matching the points) render images which are too blurred/skewed to be used for further analysis and they can't handle the amounts of motion happening in the image, and so I'm trying to come up with a different ROI tracing and extracting mechanism. Any ideas?
To illustrate:
Video (originally huge tiff file, compressed to gif for upload):
Something along the lines of result
(this is a cut made in ImageJ, preferably my program would be able to either track my ROI or just cut out enough space to extract most of the on-screen appearance of ROI, but I'm not sure what else can be done)