Image to Text scanning in Python-Django - python

I am currently working in a project in python-django. The user needs to input some data, currently it is done by typing. I want an alternative method to input data, I prefer Image To Text converter. Anyone please help me to implement a method in django-python for the same ?

I suppose you are looking for the pytesser-it is an OCR in python using tesseract. It is used to convert the text in the image in to string.
pytesser can be downloaded from: https://code.google.com/p/pytesser/downloads/list
You will also need PIL to work with images in memory. This can be downloaded from:http://www.pythonware.com/products/pil/
You can select the appropriate version of PIL according to the python version.
Hope this helps.

Related

how to fix problem of boxes generated by pdf2image while converting pdf to images.?

I'm trying to convert pdf to images using pdf2image but getting problem of extra generated boxes.
This is my input pdf file screenshot
this in input file
from pdf2image import convert_from_path
images = convert_from_path('input_pdf.pdf',output_folder=r'C:\Users\Baith')
images[0].save('output.jpg')
after executing above code got this output
output_file
Since pdf2image is only a thin wrapper around pdftoppm, itself part of poppler, I would advise trying different parameters with the CLI tools to see it a specific combination works.
As for pdf2image itself, you might want to try use_cropbox=True and see if it still add lines.
Feel free to open an issue directly of the repository, if you can provide a sample PDF I would be happy to assist with the issue.

photo format returned in telethon

I am using telethon(a library for working with telegram). I can not figure out the output for photos. Can someone tell me how to convert this format to jpg viewer? Thanks
\x01\x17(\x91\xef\xa7G\xdaDx\xf6\xff\x00\xf5\xd4\xcbrH\xcf\xc8G\xa8\xff\x00\xf5\xd6la\xe5e\\\x92\t\xab\xe0\x05\x8c\x8c\xec\xe7\x08~\x9dk>g{"\xd4U\xaeL\x97\x06N\x13c\x11\xd4\x0ei%\x96\xe1W\xe4\x84\x13\xfe\xe9\xa8\xec\x14\x0b\xb9H\x1c\x15\xe2\xb4*\x96\xaa\xe2\x94l\xecQ\x13]y`\x98#>\x9bM\x15u\xf3\xb5\xbe\x94S$\xc8\x82\xd6Y\xd9\x19\xbe\xe9\x1ds\xcd>uX\x1b\x0e\xe3\'\x91\xc1\xa2\x8a\x96\x91Wc,\xa7\x8a\xdeWw\x9bp~\x80\x03\xd6\xae\xff\x00i[c;\x8f\xfd\xf3E\x15ih&\xee\xc6\xb6\xa9jP\xe1\xdb\xa7\xf7M\x14Q#\x8f
You need to download that image to be able to see it. to do so you could use the download_media method like so
path = client.download_media(messageThatHasTheImage)
that method will return where the image was saved by default.

Is there a python module that reads a pdf and converts it to text

I mean one that is a scanned image or something like that and converts it to text or is there a way to do it
Edit: Btw this isnt meant to be a duplicate i wanna know if i can get text out of a scanned image not a regular PDF
Wrapper for Tesseract OCR is available https://pypi.python.org/pypi/tesserocr
Try PDFminer, it might suit what you need.
http://www.unixuser.org/~euske/python/pdfminer/index.html

Python library which allows me to add text on top of images

do you know if there is a Python image library which allows me to add text on top of images?
PIL can do that (it's the standard image processing library for Python anyway).
on google: "PIL add text on top of images"
first result: http://python-catalin.blogspot.com/2010/06/add-text-on-image-with-pil-module.html
anyway, if you're looking for the library, you want to check into PIL

Python Imaging Library save function syntax

Simple one I think but essentially I need to know what the syntax is for the save function on the PIL. The help is really vague and I can't find anything online. Any help'd be great, thanks :).
From the PIL Handbook:
im.save(outfile, options...)
im.save(outfile, format, options...)
Simplest case:
im.save('my_image.png')
or whatever. In this case, the type of the image will be determined from the extension. Is there a particular problem you're having? Or specific saving option that you'd like to use but aren't sure how to do so?
You may be able to find additional information in the documentation on each filetype. The PIL Handbox Appendixes list the different file types that are supported. In some cases, options are given for save. For example, on the JPEG file format page, we're told that save supports
quality
optimize, and
progressive
with notes about each option.
Image.save(filename[, format[, options]]). You can usually just use Image.save(filename) since it automatically figures out the file type for you from the extension.

Categories

Resources