I'm coding in Python 2.7 and I need to implement a process where I will read a PDF then obtain the image of the first page of the document, then from that image that contains two barcodes obtain the values of both. As of now these are the two functions I've been working on so far (I need to do a lot of polishing before I move this to an environment):
Python process to obtain the image from the PDF from a Tutorial:
from wand.image import Image as wi
pdf = wi(filename="test.pdf", resolution=300)
pdfImageTest = pdf.convert("png")
i=1
for img in pdfImage.sequence:
page = wi
(image = img)
page.save(filename="test"+str(i)+".png")
i+=1
Python process to read the barcodes from an image:
from pyzbar.pyzbar import decode
from PIL import Image
import cv2
import numpy
decodedObjects = decode(Image.open('test2.png'))
obj = decodedObjects
print(obj)
decodedObjects = decode(cv2.imread('test2.png'))
print(obj)
According to the documentation for decode function in pyzbar, the function will scan all the barcodes contained in the image but as of now for both cases I've used, I'm only obtaining the first barcode in the image. Is there a way to force the function to keep scanning the image or pointing it into a specific location of the image after finishing the process for the first image?
You should use obj.data and iterate over all objects.
Here's an example:
from pyzbar.pyzbar import decode
from PIL import Image
import cv2
import numpy
decodedObjects = decode(Image.open('test2.png'))
obj = decodedObjects
for bar in obj:
print(bar.data)
By the way, the print statement is replaced with print() function in Python 3. So if you strictly want to use Python 2.7, you should use e.g. print bar.data.
Related
I am using following code to draw rectangle on an image text for matching date pattern and its working fine.
import re
import cv2
import pytesseract
from PIL import Image
from pytesseract import Output
img = cv2.imread('invoice-sample.jpg')
d = pytesseract.image_to_data(img, output_type=Output.DICT)
keys = list(d.keys())
date_pattern = '^(0[1-9]|[12][0-9]|3[01])/(0[1-9]|1[012])/(19|20)\d\d$'
n_boxes = len(d['text'])
for i in range(n_boxes):
if int(d['conf'][i]) > 60:
if re.match(date_pattern, d['text'][i]):
(x, y, w, h) = (d['left'][i], d['top'][i], d['width'][i], d['height'][i])
img = cv2.rectangle(img, (x, y), (x + w, y + h), (0, 255, 0), 2)
cv2.imshow('img', img)
cv2.waitKey(0)
img.save("sample.pdf")
Now, at the end I am getting a PDF with rectangle on matched date pattern.
I want to give this program scanned PDF as input instead of image above.
It should first convert PDF into image format readable by opencv for same processing as above.
Please help.
(Any workaround is fine. I need a solution in which I can convert PDF to image and use it directly instead of saving on disk and read them again from there. As I have lot of PDFs to process.)
There is a library named pdf2image. You can install it with pip install pdf2image. Then, you can use the following to convert pages of the pdf to images of the required format:
from pdf2image import convert_from_path
pages = convert_from_path("pdf_file_to_convert")
for page in pages:
page.save("page_image.jpg", "jpg")
Now you can use this image to apply opencv functions.
You can use BytesIO to do your work without saving the file:
from io import BytesIO
from PIL import Image
with BytesIO() as f:
page.save(f, format="jpg")
f.seek(0)
img_page = Image.open(f)
From PDF to opencv ready array in two lines of code. I have also added the code to resize and view the opencv image. No saving to disk.
# imports
from pdf2image import convert_from_path
import cv2
import numpy as np
# convert PDF to image then to array ready for opencv
pages = convert_from_path('sample.pdf')
img = np.array(pages[0])
# opencv code to view image
img = cv2.resize(img, None, fx=0.5, fy=0.5)
cv2.imshow("img", img)
cv2.waitKey(0)
cv2.destroyAllWindows()
Remember if you do not have poppler in your Windows PATH variable you can provide the path to convert_form_path
poppler_path = r'C:\path_to_poppler'
pages = convert_from_path('sample.pdf', poppler_path=poppler_path)
You can use the library pdf2image. Install with this command: pip install pdf2image. You can then convert the file into one or multiple images readable by cv2. The next sample of code will convert the PIL Image into something readable by cv2:
Note: The following code requires numpy pip install numpy.
from pdf2image import convert_from_path
import numpy as np
images_of_pdf = convert_from_path('source2.pdf') # Convert PDF to List of PIL Images
readable_images_of_pdf = [] # Create a list for thr for loop to put the images into
for PIL_Image in images_of_pdf:
readable_images_of_pdf.append(np.array(PIL_Image)) # Add items to list
The next bit of code can convert the pdf into one big image readable by cv2:
import cv2
import numpy as np
from pdf2image import convert_from_path
image_of_pdf = np.concatenate(tuple(convert_from_path('/path/to/pdf/source.pdf')), axis=0)
The pdf2image library's convert_from_path() function returns a list containing each pdf page in the PIL image format. We convert the list into a tuple for the numpy concatenate function to stack the images on top of each other. If you want them side by side you could change the axis integer to 1 signifying you want to concatenate the images along the y-axis. This next bit of code will show the image on the screen:
cv2.imshow("Image of PDF", image_of_pdf)
cv2.waitKey(0)
This will probably create a window on the screen that is too big. To resize the image for the screen you'll use the following code that uses cv2's built-in resize function:
import cv2
from pdf2image import convert_from_path
import numpy as np
image_of_pdf = np.concatenate(tuple(convert_from_path('source2.pdf')), axis=0)
size = 0.15 # 0.15 is equal to 15% of the original size.
resized = cv2.resize(image_of_pdf, (int(image_of_pdf.shape[:2][1] * size), int(image_of_pdf.shape[:2][0] * size)))
cv2.imshow("Image of PDF", resized)
cv2.waitKey(0)
On a 1920x1080 monitor, a size of 0.15 can comfortably display a 3-page document. The downside is that the quality is reduced dramatically. If you want to have the pages separated you can just use the original convert_from_path() function. The following code shows each page individually, to go to the next page press any key:
import cv2
from pdf2image import convert_from_path
import numpy
images_of_pdf = convert_from_path('source2.pdf') # Convert PDF to List of PIL Images
count = 0 # Start counting which page we're on
while True:
cv2.imshow(f"Image of PDF Page {count + 1}", numpy.array(images_of_pdf[count])) # Display the page with it's number
cv2.waitKey(0) # Wait until key is pressed
cv2.destroyWindow(f"Image of PDF Page {count + 1}") # Destroy the following window
count += 1 # Add to the counter by 1
if count == len(images_of_pdf):
break # Break out of the while loop before you get an "IndexError: list index out of range"
This is the first time I am working with OCR. I have an image and want to extract data from the image. My image looks like this:
I have 500 such images and will have to record the parameters and the respective values. I'm thinking of doing it through code than doing manually.
I have tried with python py-tesseract and PIL libraries. They are performing good if the image contains some simple text.This is what i tried
from PIL import Image, ImageEnhance, ImageFilter
from pytesseract import image_to_string
from pytesseract import image_to_boxes
im = Image.open("AHU.png")
im = im.filter(ImageFilter.MedianFilter())
enhancer = ImageEnhance.Contrast(im)
im = enhancer.enhance(2)
im = im.convert('1')
im.save('temp2.jpg')
text = image_to_string(Image.open('temp2.jpg'))
print(text)
What to do in this case where there are several parameters? All my images are similar with respect to position of the values.
How i can save or download this picture with python

Well, the start of the string (this part data:image/jpeg;base64,) tells you it is base64 encoded. So, you need to strip all that off and find the start the image, which is /9j, and just grab the image part which is the rest of the string.
b = b'...'
z = b[b.find(b'/9'):]
Then you need base64 decode the result, and make it into a BytesIO object that PIL can read. Then read it into a PIL Image and save it:
im = Image.open(io.BytesIO(base64.b64decode(z))).save('result.jpg')
So, the entire code will look like:
import base64
import io
from PIL import Image
# Initialise your data
b = b'...'
z = b[b.find(b'/9'):]
im = Image.open(io.BytesIO(base64.b64decode(z))).save('result.jpg')
Keywords: image, image processing, base64, b64, encode, encoded, decode, decoded, PIL, Pillow, Python
I'm trying to convert some PDFs to high res jpegs using imagemagick . I'm working on win 10, 64 with python 3.62 - 64 bit and wand 0.4.4. At the command line I have :
$ /e/ImageMagick-6.9.9-Q16-HDRI/convert.exe -density 400 myfile.pdf -scale 2000x1000 test3.jpg.
which is working well for me.
In python:
from wand.image import Image
file_path = os.path.dirname(os.path.abspath(__file__))+os.sep+"myfile.pdf"
with Image(filename=file_path, resolution=400) as image:
image.save()
image_jpeg = image.convert('jpeg')
Which is giving me low res JPEGs . How do I translate this into my wand code to do the same thing?
edit:
I realized that the problem is that the input pdf has to be read into the Image object as a binary string, so based on http://docs.wand-py.org/en/0.4.4/guide/read.html#read-blob I tried:
with open(file_path,'rb') as f:
image_binary = f.read()
f.close()
with Image(blob=image_binary,resolution=400) as img:
img.transform('2000x1000', '100%')
img.make_blob('jpeg')
img.save(filename='out.jpg')
This reads the file in ok, but the output is split into 10 files. Why? I need to get this into 1 high res jpeg.
EDIT:
I need to send the jpeg to an OCR api, so I was wondering if I could write the output to a file like object. Looking at https://www.imagemagick.org/api/magick-image.php#MagickWriteImageFile, I tried :
emptyFile = Image(width=1500, height=2000)
with Image(filename=file_path, resolution=400) as image:
library.MagickResetIterator(image.wand)
# Call C-API Append method.
resource_pointer = library.MagickAppendImages(image.wand,
True)
library.MagickWriteImagesFile(resource_pointer,emptyFile)
This gives:
File "E:/ENVS/r3/pdfminer.six/ocr_space.py", line 113, in <module>
test_file = ocr_stream(filename='test4.jpg')
File "E:/ENVS/r3/pdfminer.six/ocr_space.py", line 96, in ocr_stream
library.MagickWriteImagesFile(resource_pointer,emptyFile)
ctypes.ArgumentError: argument 2: <class 'TypeError'>: wrong type
How can I get this working?
Why? I need to get this into 1 high res jpeg.
The PDF contains pages that ImageMagick considers individual images in a "stack". The wand library provides a wand.image.Image.sequance to work with each page.
However, to append all images into a single JPEG. You can either iterate over each page & stitch them together, or call C-API's method MagickAppendImages.
from wand.image import Image
from wand.api import library
import ctypes
# Map C-API not provided by wand library.
library.MagickAppendImages.argtypes = [ctypes.c_void_p, ctypes.c_int]
library.MagickAppendImages.restype = ctypes.c_void_p
with Image(filename="path_to_document.pdf", resolution=400) as image:
# Do all your preprocessing first
# Ether word directly on the wand instance, or iterate over each page.
# ...
# To write all "pages" into a single image.
# Reset the stack iterator.
library.MagickResetIterator(image.wand)
# Call C-API Append method.
resource_pointer = library.MagickAppendImages(image.wand,
True)
# Write C resource directly to disk.
library.MagickWriteImages(resource_pointer,
"output.jpeg".encode("ASCII"),
False)
Update:
I need to send the jpeg to an OCR api ...
Assuming your using OpenCV's python API, you'll only need to iterate over each page, and pass the image-file data to the OCR via numpy buffers.
from wand.image import Image
import numpy
import cv2
def ocr_process(file_data_buffer):
""" Replace with whatever your OCR-API calls for """
mat_instance = cv2.imdecode(file_data_buffer)
# ... work ...
source_image="path_to_document.pdf"
with Image(filename=source_image, resolution=400) as img:
for page in img.sequence:
file_buffer = numpy.asarray(bytearray(page.make_blob("JPEG")),
dtype=numpy.uint8)
ocr_process(file_buffer)
so I was wondering if I could write the output to a file like object
Don't assume that python "image" objects (or underlining C structures) from different libraries are comparable with each other.
Without knowing the OCR api, I can't help you past the wand part, but I can suggest one of the following...
Use temporary intermediate files. (slower I/O, but easier to learn/develop/debug)
with Image(filename=INPUT_PATH) as img:
# work
img.save(filename=OUTPUT_PATH)
# OCR work on OUTPUT_PATH
Use file descriptors if the OCR API supports it. (Same as above)
with open(INPUT_PATH, 'rb') as fd:
with Image(file=fd) as img:
# work
# OCR work ???
Use blobs. (faster I/O but need a lot more memory)
buffer = None
with Image(filename=INPUT_PATH) as img:
# work
buffer = img.make_blob(FORMAT)
if buffer:
# OCR work ???
Even More Updates
Wrapping all the comments together, a solution might be...
from wand.image import Image
from wand.api import library
import ctypes
import requests
# Map C-API not provided by wand library.
library.MagickAppendImages.argtypes = [ctypes.c_void_p, ctypes.c_int]
library.MagickAppendImages.restype = ctypes.c_void_p
with Image(filename='path_to_document.pdf', resolution=400) as image:
# ... Do pre-processing ...
# Reset the stack iterator.
library.MagickResetIterator(image.wand)
# Call C-API Append method.
resource_pointer = library.MagickAppendImages(image.wand, True)
# Convert to JPEG.
library.MagickSetImageFormat(resource_pointer, b'JPEG')
# Create size sentinel.
length = ctypes.c_size_t()
# Write image blob to memory.
image_data_pointer = library.MagickGetImagesBlob(resource_pointer,
ctypes.byref(length))
# Ensure success
if image_data_pointer and length.value:
# Create buffer from memory address
payload = ctypes.string_at(image_data_pointer, length.value)
# Define local filename.
payload_filename = 'my_hires_image.jpg'
# Post payload as multipart encoded image file with filename.
requests.post(THE_URL, files={'file': (payload_filename, payload)})
What about something like:
ok = Image(filename=file_path, resolution=400)
with ok.transform('2000x1000', '100%') as image:
image.compression_quality = 100
image.save()
or:
with ok.resize(2000, 1000)
related:
https://github.com/dahlia/wand/blob/13c4f544bd271fe298ac8dde44fbf178b349361a/docs/guide/resizecrop.rst
Python 3 Wand How to make an unanimated gif from multiple PDF pages
Each tiff file has 4 images in it. I do not wish to extract and save them if possible, I would just like to use a for loop to look at each of them. (Like look at the pixel [0,0] )and depending on what color it is in all 4 I will do something accordingly.
Is this possible using PIL?
If not what should I use.
Rather than looping until an EOFError, one can iterate over the image pages using PIL.ImageSequence (which effectively is equivalent as seen on the source code).
from PIL import Image, ImageSequence
im = Image.open("multipage.tif")
for i, page in enumerate(ImageSequence.Iterator(im)):
page.save("page%d.png" % i)
You can use the "seek" method of a PIL image to have access to the different pages of a tif (or frames of an animated gif).
from PIL import Image
img = Image.open('multipage.tif')
for i in range(4):
try:
img.seek(i)
print img.getpixel( (0, 0))
except EOFError:
# Not enough frames in img
break
Had to do the same thing today,
I followed #stochastic_zeitgeist's code, with an improvement (don't do manual loop to read per-pixel) to speed thing up.
from PIL import Image
import numpy as np
def read_tiff(path):
"""
path - Path to the multipage-tiff file
"""
img = Image.open(path)
images = []
for i in range(img.n_frames):
img.seek(i)
images.append(np.array(img))
return np.array(images)
Here's a method that reads a multipage tiff and returns the images as a numpy array
from PIL import Image
import numpy as np
def read_tiff(path, n_images):
"""
path - Path to the multipage-tiff file
n_images - Number of pages in the tiff file
"""
img = Image.open(path)
images = []
for i in range(n_images):
try:
img.seek(i)
slice_ = np.zeros((img.height, img.width))
for j in range(slice_.shape[0]):
for k in range(slice_.shape[1]):
slice_[j,k] = img.getpixel((j, k))
images.append(slice_)
except EOFError:
# Not enough frames in img
break
return np.array(images)
Thanks to the answers on this thread I wrote this python module for reading and operating on multipage tiff files: https://github.com/mpascucci/multipagetiff
It also allows to color-code the image stack "depth-wise" and make z-projections.
Hope it can help