I have to do the following task on Python and I have not idea where
to begin:
OCR of handwritten dates
Page/document orientation detection for pretreatment
Stamp and Logo Detection and classification
a. Orientation variation included
b. Quality degradation to be considered
c. Overlying Primary Content
Anybody could help me?
THANKS IN ADVANCE¡¡
You can ocrmypdf to extract text from a pdf. It will extract text from the page and return a pdf same like original pdf with text on it. For detection of logos, you need to implement a computer vision-based model. if you need more details then please specify your requirement in details
Related
I'm working on an application that would extract information from invoices that the user takes a picture of with his phone (using flask and pytesseract).
Everything works on the extraction and classification side for my needs, using the image_to_data method of pytesseract.
But the problem is on the pre-processing side.
I refine the image with greyscale filters, binarization, dilation, etc.
But sometimes the user will take a picture that has a specific angle, like this:
invoice
And then tesseract will return characters that don't make sense, or sometimes it will just return nothing.
At the moment I "scan" the image during pre-processing (I'm largely inspired by this tutorial: https://www.pyimagesearch.com/2014/09/01/build-kick-ass-mobile-document-scanner-just-5-minutes/), but it's not efficient at all.
Does anyone know a way to make it easier for tesseract to work on these type of images?
If not, should I focus on making this pre-processing scan thing?
Thank you for your help!
I've implemented CBIR app by using standard ConvNet approach:
Use Transfer Learning to extract features from the data set of images
Cluster extracted features via knn
Given search image, extract its features
Give top 10 images that are close to the image in hand in knn network
I am getting good results, but I want to further improve them by adding text search as well. For instance, when my image is the steering wheel of the car, the close results will be any circular objects that resemble a steering wheel for instance bike wheel. What would be the best possible way to input text say "car part" to produce only steering wheels similar to the search image.
I am unable to find a good way to combine ConvNet with text search model to construct improved knn network.
My other idea is to use ElasticSearch in order to do text search, something that ElasticSearch is good at. For instance, I would do a CBIR search described previously and out of the return results, I can look up their description and then use ElasticSearch on the subset of the hits to produce the results. Maybe tag images with classes and allow user to de/select groups of images of interest.
I don't want to do text search before image search as some of the images are poorly described so text search would miss them.
Any thoughts or ideas will be appreciated!
I have not found the original paper, but maybe you might find it interesting: https://www.slideshare.net/xavigiro/multimodal-deep-learning-d4l4-deep-learning-for-speech-and-language-upc-2017
It is about looking for a vector space where both images and text are (multimodal embedding). In this way you can find text similar to a images, images referring to a text, or use the tuple text / image to find similar images.
I think maybe this idea is an interesting point to start from.
I think i am looking for something simpler than detecting a document boundaries in a photo. I am only trying to flag photos which are mostly of documents rather than just a normal scene photo. is this an easier problem to solve?
Are the documents mostly white? If so, you could analyse the images for white content above a certain percentage. Generally text documents only have about 10% printed content on them in total.
Hi I tried to use pdfimages to extract ID images from my pdf resume files. However for some files they return also the icon, table lines, border images which are totally irrelevant.
Is there anyway I can limit it to only extract person photo? I am thinking if we can define a certain size constraints on the output?
You need a way of differentiating images found in the PDF in order to extract the ones of interest.
I believe you have the options of considering:
Image characteristics such as Width, Height, Bits Per Component, ColorSpace
Metadata information about the image (e.g. a XMP tag of interest)
Facial recognition of the person in the photo or Form recognition of the structure of the ID itself.
Extracting all of the images and then use some image processing code to analyze the images to identify the ones of interest.
I think 2) may be the most reliable method if the author of the PDF included such information with the photo IDs. 3) may be difficult to implement and get a reliable result from consistently. 1) will only work if that is a reliable means of identifying such photo IDs for your PDF documents.
Then you could key off of that information using your extraction tool (if it lets you do that). Otherwise you would need to write your own extraction tool using a PDF library.
I have some kinds of paper sheet and I am writing python script with opencv to recognize the same paper sheet to classify. I am stuck in how to find the same kind of paper sheet. For example, I attached two pic. Picture 1 is the template and picture 2 is some kind of paper I need to know if it is matching with the template. I don't need to match the text and I just need to match the form. I need to classify the same sheet in many of paper sheet.
I have adjust the skew of paper and detect some lines but I don't know how to match the lines and judge this paper sheet is the same kind with the template.
Is there any one can give me an advice for the matching algorithm?
I'm not sure if such paper form is rich enough in visual information for this solution, but I think you should start with feature detection and homography calculation (opencv tutorial: Features2D + Homography). From there you can try to adjust 2D features for your problem.
Check out the findContours and MatchShape function. Either way, you are much better off matching a specific visual ID within the form that is representative of the form. Like a really simple form of barcode.