Split .TIF file using PIL - python

I took a look at the Split multi-page tiff with python file for Splitting a .TIFF File, however to be honest, I didn't fully understand the answers, and I'm hoping for a little clarification.
I am attempting to take a .Tif file with multiple Invoices in it and Split it into each page which will then be Zipped Up and uploaded into a database. PIL is installed on the computers that will be running this program, as such I'd like to stick with the PIL Library. I know that I can view information such as the Size of each Image using PIL after it's open, however when I attempt to Save each it gets dicey. (Example Code Below)
def Split_Images(img,numFiles):
ImageFile = Image.open(img)
print ImageFile.size[0]
print ImageFile.size[1]
ImageFile.save('InvoiceTest1.tif')[0]
ImageFile.save('InvoiceTest2.tif')[1]
However when I run this code I get the following Error:
TypeError: 'NoneType' object has no attribute '__getitem__'
Any Suggestions?
Thank you in advance,

You need the PIL Image "seek" method to access the different pages.
from PIL import Image
img = Image.open('multipage.tif')
for i in range(4):
try:
img.seek(i)
img.save('page_%s.tif'%(i,))
except EOFError:
break

Related

Pass PIL Image to google cloud vision without saving and reading

UPDATE BELOW
Is there a way to pass a PIL Image to google cloud vision?
I tried to use io.Bytes, io.String and Image.tobytes() but I always get:
Traceback (most recent call last):
"C:\Users\...\vision_api.py", line 20, in get_text
image = vision.Image(content)
File "C:\...\venv\lib\site-packages\proto\message.py", line 494, in __init__
raise TypeError(
TypeError: Invalid constructor input for Image:b'Ma\x81Ma\x81La\x81Ma\x81Ma\x81Ma\x81Ma\x81Ma\x81Ma\x81Ma\x81Ma\x81La\x81Ma\x81Ma\x81Ma\x81Ma\x80Ma\x81La\x81Ma\x81Ma\x81Ma\x80Ma\x81Ma\x81Ma\x81Ma\x8 ...
or this if I pass the PIL-Image directly:
TypeError: Invalid constructor input for Image: <PIL.Image.Image image mode=RGB size=480x300 at 0x1D707131DC0>
This is my code:
image = Image.open(path).convert('RGB') # Opening the saved image
cropped_image = image.crop((30, 900, 510, 1200)) # Cropping the image
vision_image = vision.Image(# I passed the different options) # Here I need to pass the image, but I don't know how
client = vision.ImageAnnotatorClient()
response = client.text_detection(image=vision_image) # Text detection using google-vision-api
FOR CLARITY:
I want google text detection to only analyse a certain part of an image saved on my disk. So my idea was to crop the image using PIL and then pass the cropped image to google-vision. But it is not possible to pass an PIL-Image to vision.Image, as I get the error above.
The documentation from Google.
This can be found in the vision.Image class:
Attributes:
content (bytes):
Image content, represented as a stream of bytes. Note: As
with all ``bytes`` fields, protobuffers use a pure binary
representation, whereas JSON representations use base64.
Currently, this field only works for BatchAnnotateImages
requests. It does not work for AsyncBatchAnnotateImages
requests.
A working option is to save the PIL-Image as a PNG/JPG on my disk and load it using:
with io.open(file_name, 'rb') as image_file:
content = image_file.read()
vision_image = vision.Image(content=content)
But this is slow and seems unnecessary. And the whole point for me behind using google-vision-api is the speed comaped to open-cv.
UPDATE as of 25/9/2021
from PIL import Image
from io import BytesIO
from google.cloud import vision
with open('images/screenshots/screenshot.png', 'rb') as image_file:
data = image_file.read()
try:
image = vision.Image(content=data)
print('worked')
except TypeError:
print('failed')
im = Image.open('images/screenshots/screenshot.png')
buffer = BytesIO()
im.save(buffer, format='PNG')
try:
image = vision.Image(buffer.getvalue())
print('worked')
except TypeError:
print('failed')
The first version works as expected, but I can't get the second one to work as #Mark Setchell recommended. The first few characters (~50) are the same, the rest is completely different.
UPDATE as of 26/9/2021
Both inputs are of type <class 'bytes'>. The complete error stack can be seen at the top of the question.
Using this code:
print(input_data[:200])
print(type(input_data))
i get the following output:
b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x048\x00\x00\x07\x80\x08\x06\x00\x00\x00+a\xe7\n\x00\x00\x00\x04sBIT\x08\x08\x08\x08|\x08d\x88\x00\x00 \x00IDATx\x9c\xec\xbdy\xd8-\xc7Y\x1f\xf8\xab\xea>\xe7\xdb\xef\xaa\xbbk\xb3%\xcb\x8b\x16[\x12\xc6\xc8\xbb,\x1b\x03\x06\xc6\x8111\x93#2y\xc2381\x8b1\x90\x10\x9e\xf18\x93\x10\x0811\x84\x192\x0c3\x9e\x1020\x03\x03\xc3\xb0\x04\xf0C0\xc6\x96m\xc9\x96m\xed\xb2dI\x96\xaetu\xf7\xed\xdb\xcf\xe9\xae\x9a?j\xe9\xea\xbd\xba\xbb\xbaO\x9f\xef\x9e\xd7\xd6\xfd\xfat\xbf\xf5Vu-o\xbd\xf5\xeb\xb7\xde"\xef\xff\xc7\'8\x1c\x13\x07\x00\xd2\x82\xcc6\xe5\xc6\xa8B&'
<class 'bytes'>
for the working input.
And:
b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x048\x00\x00\x07\x80\x08\x06\x00\x00\x00+a\xe7\n\x00\x01\x00\x00IDATx\x9c\xec\xbdw\x80$\xc7u\x1f\xfc\xab\xea\xeeI\x9bw/\'\x1cr\xce\x04#\x10\x04A\x82`\x84\x95%J"\x95,\xcb\x1f%\x91T\xb0$*}\x1fM\xd9\x96\x95EY\x94(\xc9\xb6\x92i+\x90\x12\x83(3)0\x82\x08$rN\x07\\\xce\xb7\xb7yBw\xd5\xf7G\x85\xaeN3\xdd=\xdd\xb3\xb3{\xfb\xc8\xc3\xceLW\xbd\xca\xaf\xde\xfb\xf5\xabW\xe4{\xdeu\x84\xa3`\xe2\x00#J\xe0Y&\xdf\x00e($\x94\x94\'p\xcc\xc3\xda\xe7Y\x0c\xf1Te\x13\xbf\xcc>\xfa:]Y=x\x84\x7f\xe8\xc23u\x1f\x91l\xfd\x99'
<class 'bytes'>
for the failing input.
As far as I can tell, you start off with a PIL Image and you want to obtain a PNG image in memory without going to disk. So you need this:
#!/usr/bin/env python3
from PIL import Image
from io import BytesIO
# Create PIL Image like you have - filled with red
im = Image.new('RGB', (320,240), (255,0,0))
# Create in-memory PNG - like you want for Google Cloud Vision
buffer = BytesIO()
im.save(buffer, format="PNG")
# Look at first few bytes
PNG = buffer.getvalue()
print(PNG[:20])
It prints this, which is exactly what you would get if you wrote the image to disk as a PNG and then read it back as binary - except this does it in memory without going to disk:
b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x01#'
It would be good to have whole error stack and more accurate code snippet. But form presented information this seems to be confusion of two different "Images". Probably the some copy/paste error, as the tutorials have exactly the same line:
response = client.text_detection(image=image)
But mentioned tutorials image is created by vision.Image() so I think in presented code this should be:
response = client.text_detection(image=vision_image)
As, at least if I understand correctly the code snippet, image is PIL Image, while vision_image is Vision Image that should be passed to text_detection method. So whatever is done in vision.Image() does not have effect on the error massage.

How can I pytest a function that creates a list of objects?

I have a function that takes a directory of images, reads them, and stores them in a list.
When pytesting a basic example of reading 3 images, I can't pass the test because the images have an allocation in memory data that makes the assertion to fail.
import os
from PIL import Image
def getImages(imageDir):
files = os.listdir(imageDir)
images = []
for file in files:
# Getting the full image name
filePath = os.path.abspath(os.path.join(imageDir, file))
try:
# explicit load to prevent resources crunch
fp = open(filePath, "rb")
im = Image.open(fp)
images.append(im)
# force loading the image data from file
im.load()
# close the file
fp.close()
except Exception:
# skip
print("Invalid image: %s" % (filePath,))
return images
def test_for_clean_data():
assert getImages("test_images") == [Image.open("test_images/01.jpg"),
Image.open("test_images/02.jpg"),
Image.open("test_images/03.jpg")]
<PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=2448x2765 at 0x1E48F5872C8>
<PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=2448x2765 at 0x1E48F5878C8>
As shown in the example error provided by the console, same image will have different properties when tested.
Function to test is PIL.Image based.
Perhaps, as someone suggested the test is flawed in its origin. If anyone knows a better way to pytest that the function is properly working, I would be more than happy to try a new idea. There's so much to learn.
Suggestions for correct test naming are also welcome.
Eyeballing the code, it looks like one potential reason your objects differ is that you call im.load() in getImages(), but not when you open your test images. Does this work? This is just a quick guess, I haven't tested it.
assert getImages("test_images") == [Image.open("test_images/01.jpg").load(),
Image.open("test_images/02.jpg").load(),
Image.open("test_images/03.jpg").load()]

Python Script to detect broken images

I wrote a python script to detect broken images and count them,
The problem in my script is it detects all the images and does not detect broken images. How to fix this. I refered :
How to check if a file is a valid image file? for my code
My code
import os
from os import listdir
from PIL import Image
count=0
for filename in os.listdir('/Users/ajinkyabobade/Desktop/2'):
if filename.endswith('.JPG'):
try:
img=Image.open('/Users/ajinkyabobade/Desktop/2'+filename)
img.verify()
except(IOError,SyntaxError)as e:
print('Bad file : '+filename)
count=count+1
print(count)
I have added another SO answer here that extends the PIL solution to better detect broken images.
I also implemented this solution in my Python script here on GitHub.
I also verified that damaged files (jpg) frequently are not 'broken' images i.e, a damaged picture file sometimes remains a legit picture file, the original image is lost or altered but you are still able to load it.
I quote the other answer for completeness:
You can use Python Pillow(PIL) module, with most image formats, to check if a file is a valid and intact image file.
In the case you aim at detecting also broken images, #Nadia Alramli correctly suggests the im.verify() method, but this does not detect all the possible image defects, e.g., im.verify does not detect truncated images (that most viewer often load with a greyed area).
Pillow is able to detect these type of defects too, but you have to apply image manipulation or image decode/recode in or to trigger the check. Finally I suggest to use this code:
try:
im = Image.load(filename)
im.verify() #I perform also verify, don't know if he sees other types o defects
im.close() #reload is necessary in my case
im = Image.load(filename)
im.transpose(PIL.Image.FLIP_LEFT_RIGHT)
im.close()
except:
#manage excetions here
In case of image defects this code will raise an exception.
Please consider that im.verify is about 100 times faster than performing the image manipulation (and I think that flip is one of the cheaper transformations).
With this code you are going to verify a set of images at about 10 MBytes/sec (modern 2.5Ghz x86_64 CPU).
For the other formats psd,xcf,.. you can use Imagemagick wrapper Wand, the code is as follows:
im = wand.image.Image(filename=filename)
temp = im.flip;
im.close()
But, from my experiments Wand does not detect truncated images, I think it loads lacking parts as greyed area without prompting.
I red that Imagemagick has an external command identify that could make the job, but I have not found a way to invoke that function programmatically and I have not tested this route.
I suggest to always perform a preliminary check, check the filesize to not be zero (or very small), is a very cheap idea:
statfile = os.stat(filename)
filesize = statfile.st_size
if filesize == 0:
#manage here the 'faulty image' case
You are building a bad path with
img=Image.open('/Users/ajinkyabobade/Desktop/2'+filename)
Try the following instead (by adding / to the end of the directory path)
img=Image.open('/Users/ajinkyabobade/Desktop/2/'+filename)
or
img=Image.open(os.path.join('/Users/ajinkyabobade/Desktop/2', filename))
try the below: It worked fine for me. It identifies the bad/corrupted image and remove them as well. Or if you want you can only print the bad/corrupted file name and remove the final script to delete the file.
for filename in listdir('/Users/ajinkyabobade/Desktop/2/'):
if filename.endswith('.JPG'):
try:
img = Image.open('/Users/ajinkyabobade/Desktop/2/'+filename) # open the image file
img.verify() # verify that it is, in fact an image
except (IOError, SyntaxError) as e:
print(filename)
os.remove('/Users/ajinkyabobade/Desktop/2/'+filename)
I am getting an error that tells me that Image.load is not available. Image.open appears to work.
I was also getting errors using:
except (IOError, SyntaxError) as e:
I just changed that to:
except:
and it worked fine.

OpenCV Python not opening images with imread()

I'm not entirely sure why this is happening but I am in the process of making a program and I am having tons of issues trying to get opencv to open images using imread. I keep getting errors saying that the image is 0px wide by 0px high. This isn't making much sense to me so I searched around on here and I'm not getting any answers from SO either.
I have taken about 20 pictures and they are all using the same device. Probably 8 of them actually open and work correctly, the rest don't. They aren't corrupted either because they open in other programs. I have triple checked the paths and they are using full paths.
Is anyone else having issues like this? All of my files are .jpgs and I am not seeing any problems on my end. Is this a bug or am I doing something wrong?
Here is a snippet of the code that I am using that is reproducing the error on my end.
imgloc = "F:\Kyle\Desktop\Coinjar\Test images\ten.png"
img = cv2.imread(imgloc)
cv2.imshow('img',img)
When I change the file I just adjust the name of the file itself the entire path doesn't change it just refuses to accept some of my images which are essentially the same ones.
I am getting this error from a later part of the code where I try to use img.shape
Traceback (most recent call last):
File "F:\Kyle\Desktop\Coinjar\CoinJar Test2.py", line 14, in <module>
height, width, depth = img.shape
AttributeError: 'NoneType' object has no attribute 'shape'
and I am getting this error when I try to show a window from the code snippet above.
Traceback (most recent call last):
File "F:\Kyle\Desktop\Coinjar\CoinJar Test2.py", line 11, in <module>
cv2.imshow('img',img)
error: ..\..\..\..\opencv\modules\highgui\src\window.cpp:261: error: (-215) size.width>0 && size.height>0 in function cv::imshow
Probably you have problem with special meaning of \ in text - like \t or \n
Use \\ in place of \
imgloc = "F:\\Kyle\\Desktop\\Coinjar\\Test images\\ten.png"
or use prefix r'' (and it will treat it as raw text without special codes)
imgloc = r"F:\Kyle\Desktop\Coinjar\Test images\ten.png"
EDIT:
Some modules accept even / like in Linux path
imgloc = "F:/Kyle/Desktop/Coinjar/Test images/ten.png"
From my experience, file paths that are too long (OS dependent) can also cause cv2.imread() to fail.
Also, when it does fail, it often fails silently, so it is hard to even realize that it failed, and usually something further the the code will be what sparks the error.
Hope this helps.
Faced the same problem on Windows: cv.imread returned None when reading jpg files from a subfolder. The same code and folder structure worked on Linux.
Found out that cv.imread processes the same jpg files, if they are in the same folder as the python file.
My workaround:
copy the image file to the python file folder
use this file in cv.imread
remove redundant image file
import os
import shutil
import cv2 as cv
image_dir = os.path.join('path', 'to', 'image')
image_filename = 'image.jpg'
full_image_path = os.path.join(image_dir, image_filename)
image = cv.imread(full_image_path)
if image is None:
shutil.copy(full_image_path, image_filename)
image = cv.imread(image_filename)
os.remove(image_filename)
...
I had i lot of trouble with cv.imread() not finding my Image. I think i tryed everything involving changing the path. The os.path.exists(file_path) function also gave me back a True.
I finaly solved the problem by loading the images with imageio.
img = imageio.imread('file_path')
This also loads the img in a numpy array and you can use funktions like cv.matchTemplate() on this object. But i would recomment if u are doing stuff with multiple images that you then read all of them with imageio because i found diffrences in the arrays produced by .imread() from the two libs (opencv, imageio) on a File both of them could open.
I hope i could help someone
Take care to :
try imread() with a reliable picture,
and the correct path in your context like (see Kyle772 answer). For me either //or \.
I lost a couple of hours trying with 2 images saved from a left click in a browser. As soon as I took a personal camera image, it works fine.
Spyder screen shot
#context windows10 / anaconda / python 3.2.0
import cv2
print(cv2.__version__) # 3.2.0
imgloc = "D:/violettes/Software/Central/test.jpg" #this path works fine.
# imgloc = "D:\\violettes\\Software\\Central\\test.jpg" this path works fine also.
#imgloc = "D:\violettes\Software\Central\test.jpg" #this path fails.
img = cv2.imread(imgloc)
height, width, channels = img.shape
print (height, width, channels)
python opencv image-loading imread
I know that the question is already answered but in case anybody still is not able to load images with imread. It may be because there are letters in the string path witch imread does not accept.
For exmaple umlauts and diacritical marks.
My suggestion for everyone facing the same problem is to try this:
cv2.imshow("image", img)
The img is keyword. Never forget.
When you get error like this AttributeError: 'NoneType' object has no attribute 'shape'
Try with new_image=image.copy

PIL - Open image is not able to be read

Im trying to read a file's format so I can correctly assign a new name to it and write it to disk, but when the Image.open() is on the image, I cannot write the image to disk. So for example :
This works:
>>>file = open('708864.jpg')
>>> open('lala.jpeg', 'w').write(file.read())
But, this doesn't
>>>import Image
>>>im = Image.open('708864.jpg')
>>> im.format
>>> open('lala.jpeg', 'w').write(file.read())
It creates a corrupted file (lala.jpeg) which is unable to be opened by any software.
I'm suspecting the culprit is the Image.open(). And after trying to locate an Image.close() statement, I was unable to find one. How would you "close" this image, so I can still write it to disk?
As suggested in my comment, im.save('lala.jpg') is the way to go.
For all the other fun methods on an Image object, you can look at the documentation.
Some workaround, it is just idea:
import Image
import StringIO
file = open('/home/mrok/1.jpg')
output = StringIO.StringIO(file.read())
im = Image.open('/home/mrok/1.jpg')
im.format
open('/home/mrok/2.jpg', 'w').write(output.getvalue())
output.close()
As said in a comment, I ended up using a function I never knew about before, Image.save() , which quickly solves my problem.

Categories

Resources