OCR an iOS/Android messaging app screenshot - python

I have a project where I need to convert a screenshot from a messaging application and convert it into a machine readable format (perhaps JSON). I'm asking if you can outline a basic approach for my algorithm. I intend to write my algo in Python.
How to persist the back-and-forth/conversation format. Should I split the source image into separate chunks, one chunk for each blue/white speech bubble? I would subsequently feed those individual speech bubbles into an OCR engine, and maintain ordering.
Which OCR engine would perform best for screenshots? Obviously my source images are not handwritten. The text is machine printed with a designated font and font size. The screenshots, thanks to todays "retina" displays are high resolution, but still a low DPI. Should I rescale/resize the images?
How can I handle emojis? The messaging app user has an option to insert an emoji. Again, the set of emojis is well defined. Can an OCR program be taught to learn these characters?
Image for reference

Related

OCR for getting text from images

enter image description here
I need to get the text from a few images where there is a meter and I need to read the text from lcd of the meter. i have tried several ways but to no success.
Use Google Cloud Platform(GCP). You have a lot of APIs for computer vision.
In the OCP they also provide API to Detect text in Image.
Check the link below for a detailed description of how it works.
GCP OCR Documentation
If this is not what you are looking for then ask your question more descriptive about what exactly the platform you are working and what are all the things you tried so far.
I agree with what #AntoPravin has answered. Also for the answer to your comment, I'd like to inform you that GCP detection is way more powerful than Microsoft vision API. I've personally compared Google's vision API, Microsoft vision API, and tesseract, and GCP is miles ahead of both these two. GCP is able to detect almost everything that you can see with your naked eye.
I tried GCP on your image. These are the results and as you can see, I'm able to get the reading of the meter. Getting the numerical value from this text is not a problem. You can use regex for that.
LOBAT
15.4
mv
SHUNT
ATXP 010
ON
LOBAT 15.4 mv SHUNT ATXP 010 ON

Is there a Python library to create thumbnails for various document file formats?

I'd like to generate thumbnails from various "document" file formats such as odt, doc(x) and ppt(x) but also mp4, psd, tiff (and possibly others) from a Python application. As far as I know for each of these formats there is at least one open source application which can generate preview images/thumbnails (e.g. LibreOffice, ffmpeg) or at least extract embedded thumbnails (e.g. imagemagick).
My main problem is that each of these applications/libraries use different command line options so I'm looking for a Python library (or a unified CLI tool) which provides a high-level API to generate a thumbnail with specified dimensions, quality level given a filename and calls the appropriate external tool (ideally including catching exceptions, segfaults and timeouts). Bonus points if it can generate multiple thumbnails if requested (e.g. one per page, page X-Y, every Z seconds but at most N images).
Does anyone know such a library/utility? (Boundary condition: The files may contain sensitive material or might be quite big so this must work without any network communication, using an external web service is not possible.)
If there is no such thing in Python, a locally installable web service would be fine as well.
I ended up writing my own library (named anythumbnailer, MIT license) which worked well enough for my immediate needs. The library is not what I envisioned (only basic thumbnailing, no support for dimensions, …) but it can generate thumbnails for doc(x), xls(x), ppt(x), videos and pdf on Linux with the help of ffmpeg, LibreOffice and ffmpeg.
you can look at Preview generator. preview-generator is a library for generating preview - thumbnails, pdf, text and json overview for all your file-based content. This module gives you access to jpeg, pdf, text, htlm and json preview of virtually any kind of file. It also includes a cache mechanism so you do not have to care about preview storage.

Image Optimization (Google App Engine with Python)

I haven't found a similar question that I'm looking for Image Optimization.
I've tested how much Facebook can optimize the image uploaded:
980KB --> 77KB
846KB --> 62.1KB
From what I found out, Facebook is capable of optimizing the image up to 10 times while still pertaining some minimum image quality, as for the test above.
So, can anyone share what are the best ways that you have implemented to optimize image uploaded by user ?
When I searched in internet, I've seen some websites offer paid service for image optimization. However, we prefer not to subscribe for any paid service for image optimization at this stage.
I'm developing the project with Python language within Google App Engine environment. Any part where we can reuse from Python libraries or even Google App Engine libraries to achieve so ?
Probably you should star this issue to get pngcrush like functionality added to the AppEngine images API.
Basic optimization boils down to:
Choosing the appropriate format for the image (usually jpeg for
photographs; you can use jpeg across the board if you're not
concerned about image quality but otherwise png for screenshots etc.
may be wise)
Reducing the image to the smallest resolution appropriate for your
application
Increasing the compression level to the highest level possible while
maintaining your quality standards
You can also nitpick by stripping extraneous metadata, but that is usually unnecessary and not desirable.
If you want to do all of this in an automated fashion, you'll have to set a standard format and compression level across the board and accept that it won't be perfect in all cases, or else be able to determine what settings are appropriate for the image programmatically (which is quite difficult, unless you simply ask your users at upload time directly).
Normally I would use ImageMagick via the PythonMagick bindings for this task, but that may not be feasible on Google Apps Engine. In that case, maybe look at the Python Imaging Library.
Another solution is to use a 3rd party api, in this case you can use tinyPNG. There compression algorithm is probably one of the best out there. Check there developer guide here ~>
https://tinypng.com/developers
The first 500 photos per month are free & it's like $0.009 per image (> 500 && < 9500) or $0.002 > 10000 images.
You can't use PythonMagick unfortunately. But Python Imaging Library can be installed, and see Google Imaging Service on how to use it.
There is no magic bullet facebookesque optimization. You will have to try to develop your own that meets the standards you need. Most images these days are 5mp and up resizing them to 1280x720 or less is normal in web sites. The ability to crop extraneous image is also desirable before resizing.

Google App Engine (So "Pure Python"): Convert PDF to Image

In Google App Engine, I need to be able to take an uploaded PDF and convert it to an image (or maybe one day a number of tiled images) for storing and serving back out. Is there a library that will read PDF files that is also 100% python (so it can be uploaded with my app)?
From what I've gathered so far...
PIL does not read PDF files, only writes them.
GhostScript is the standard FOSS PDF reader, but I don't believe I'll be able to upload it with my app to GAE since I don't believe it's 100% python.
Is there anything else I might be able to use? Or maybe even a web service that I can call?)
You may want to look into using the GAE Conversion API (not yet fully released). There's a tester signup form here, with a link to further details.
From the doc:
Conversions can be performed in any direction between PDF, HTML, TXT, and image formats, and OCR will be employed if necessary. Note that while PNG, GIF, JPEG, and BMP image formats are supported as input formats, only PNG is available for output.

Text to a PNG on App Engine (Python)

Note: I am cross-posting this from App Engine group because I got no answers there.
As part of my site about Japan, I have a feature where the user can
get a large PNG for use as desktop background that shows the user's
name in Japanese. After switching my site hosting entirely to App
Engine, I removed this particular feature because I could not find any
way to render text to a PNG using the image API.
In other words, how would you go about outputting an unicode string on
top of an image of known dimensions (1024x768 for example), so that
the text will be as large as possible horizontally, and centered
vertically? Is there a way to do this is App Engine, or is there some
external service besides App Engine that could make this easier for
me, that you could recommend (besides running ImageMagick on your own
server)?
Solution #1. Pure Python image library.
You can try to bundle PyPNG with your application. PyPNG is a pure Python library to create PNG images. It depends on zlib module, which is allowed on AppEngine, so PyPNG should work on AppEngine. Just use StringIO objects instead of files and write PNG data to them.
Shamelessly adapting PyPNG example how to make a bitmap PNG image:
import png
from StringIO import StringIO
# bitmap data
s = ['110010010011',
'101011010100',
'110010110101',
'100010010011']
s = map(lambda x: map(int, x), s)
f = StringIO()
w = png.Writer(len(s[0]), len(s), greyscale=True, bitdepth=1)
w.write(f, s)
# binary PNG data
print f.getvalue()
I suspect suboptimal performance, but as far as I know there is no other way to generate images on GAE.
And you still need to figure out how to rasterize text to produce bitmap data. The easiest way, probably, is just to keep bitmaps of all the symbols around (essentially, using a bitmap font).
To render ASCII text with PyPNG take a look at texttopng script.
So, limitations are:
Probably slow (needs to be checked)
Glyph rasterization is to be addressed
Solution #2. Off-site text-to-image rendering.
Google AppEngine does not provide tools to render text as raster images, but Google Charts does. With a proper choice of parameters, the outline text chart just renders simple text to PNG images.
For example, http://chart.apis.google.com/chart?chst=d_text_outline&chld=000000|32|h|FFFFFF|_|Render text to image|with Google Charts.|Some Unicode too:|Здра́вствуйте|こんにちは|नमस्ते|你好|שלו produces this:
Limitations:
You cannot generate images bigger than 300000 pixels
Style and font customizations are limited
Some Unicode scripts are not available
White background only
I ran into this same problem with writing text to an image. The issue at hand is that any imaging libraries used on google app engine must be pure python, which rules out PIL.
PyBMP
PyBMP is a pure-python library that can do simple text rendering. From there you can use google's imaging library to composite the resulting bitmap onto your other pictures. There's some sample code below. The downside is the library lacks nicer features like anti-aliasing and fine control over fonts so the text that it renders looks kind of crappy. It also may or may not handle unicode well.
# Create the image
text_img = bmp.BitMap(300,35,bmp.Color.WHITE)
# bmpfont_Tw_Cen_MT_30 is a generated file using PyBMP's tool
text_img.setFont(bmpfont_Tw_Cen_MT_30.font_data)
text_img.setPenColor( bmp.Color.BLACK )
text_img.drawText(name, 0, 0)
After this you can use google's composite function on text_img.getBitmap() as you would any other image.
External Image Processing
If the text isn't good enough (it wasn't for my project), an alternative solution is to set up an external server on a service like Rackspace purely for image processing. Set up an HTTP handler that does your image processing with PIL, and then returns the resulting image. From there you can either
upload the result straight to your static file hosting server (like s3) or
get the generated-text image result with app engine's urlfetch library and do the rest of your compositing in app engine
Not pretty, but it gets the job done.
It's a bit too late but I was looking for the same. I managed to draw unicode string (here Devanagari) onto an image and save it as a '.png' file by doing the following:
# -*- coding: utf-8 -*-
import Image, ImageDraw, ImageFont
img = Image.new('L', (16,16), 255)
draw = ImageDraw.Draw(img)
text_to_draw = unicode('क','utf-8')
font = ImageFont.truetype('Path/to/font/file',12)
draw.text((2,2), text_to_draw, font = font)
del draw
img.save('image.png')
P.S. got help from other posts on stackoverflow
[Stop press: As comment suggests - this answer doesn't work in Googe App Engine.]
The Python Imaging Library (PIL) can accomplish this.
You can load in the image, draw Unicode text on it with the ImageDraw.text() function.
You may need to call ImageDraw.textsize() a few times with different font sizes to find thelargest font that will fit.
Finally, you can save the .png image to a file (or serve it back directly).
Test with large images if you are running it from within the context of a web-server, to make sure you can allocate sufficient memory to processs large PNG files.
(Have I answered your question appropriately? I don't know if PIL is an option from within the Google App Engine.)

Categories

Resources