Python Script to detect broken images

Python Script to detect broken images - python

I wrote a python script to detect broken images and count them,
The problem in my script is it detects all the images and does not detect broken images. How to fix this. I refered :
How to check if a file is a valid image file? for my code
My code
import os
from os import listdir
from PIL import Image
count=0
for filename in os.listdir('/Users/ajinkyabobade/Desktop/2'):
if filename.endswith('.JPG'):
try:
img=Image.open('/Users/ajinkyabobade/Desktop/2'+filename)
img.verify()
except(IOError,SyntaxError)as e:
print('Bad file : '+filename)
count=count+1
print(count)

I have added another SO answer here that extends the PIL solution to better detect broken images.
I also implemented this solution in my Python script here on GitHub.
I also verified that damaged files (jpg) frequently are not 'broken' images i.e, a damaged picture file sometimes remains a legit picture file, the original image is lost or altered but you are still able to load it.
I quote the other answer for completeness:
You can use Python Pillow(PIL) module, with most image formats, to check if a file is a valid and intact image file.
In the case you aim at detecting also broken images, #Nadia Alramli correctly suggests the im.verify() method, but this does not detect all the possible image defects, e.g., im.verify does not detect truncated images (that most viewer often load with a greyed area).
Pillow is able to detect these type of defects too, but you have to apply image manipulation or image decode/recode in or to trigger the check. Finally I suggest to use this code:
try:
im = Image.load(filename)
im.verify() #I perform also verify, don't know if he sees other types o defects
im.close() #reload is necessary in my case
im = Image.load(filename)
im.transpose(PIL.Image.FLIP_LEFT_RIGHT)
im.close()
except:
#manage excetions here
In case of image defects this code will raise an exception.
Please consider that im.verify is about 100 times faster than performing the image manipulation (and I think that flip is one of the cheaper transformations).
With this code you are going to verify a set of images at about 10 MBytes/sec (modern 2.5Ghz x86_64 CPU).
For the other formats psd,xcf,.. you can use Imagemagick wrapper Wand, the code is as follows:
im = wand.image.Image(filename=filename)
temp = im.flip;
im.close()
But, from my experiments Wand does not detect truncated images, I think it loads lacking parts as greyed area without prompting.
I red that Imagemagick has an external command identify that could make the job, but I have not found a way to invoke that function programmatically and I have not tested this route.
I suggest to always perform a preliminary check, check the filesize to not be zero (or very small), is a very cheap idea:
statfile = os.stat(filename)
filesize = statfile.st_size
if filesize == 0:
#manage here the 'faulty image' case

You are building a bad path with
img=Image.open('/Users/ajinkyabobade/Desktop/2'+filename)
Try the following instead (by adding / to the end of the directory path)
img=Image.open('/Users/ajinkyabobade/Desktop/2/'+filename)
or
img=Image.open(os.path.join('/Users/ajinkyabobade/Desktop/2', filename))

try the below: It worked fine for me. It identifies the bad/corrupted image and remove them as well. Or if you want you can only print the bad/corrupted file name and remove the final script to delete the file.
for filename in listdir('/Users/ajinkyabobade/Desktop/2/'):
if filename.endswith('.JPG'):
try:
img = Image.open('/Users/ajinkyabobade/Desktop/2/'+filename) # open the image file
img.verify() # verify that it is, in fact an image
except (IOError, SyntaxError) as e:
print(filename)
os.remove('/Users/ajinkyabobade/Desktop/2/'+filename)

I am getting an error that tells me that Image.load is not available. Image.open appears to work.
I was also getting errors using:
except (IOError, SyntaxError) as e:
I just changed that to:
except:
and it worked fine.

Related

How can I pytest a function that creates a list of objects?

I have a function that takes a directory of images, reads them, and stores them in a list.
When pytesting a basic example of reading 3 images, I can't pass the test because the images have an allocation in memory data that makes the assertion to fail.
import os
from PIL import Image
def getImages(imageDir):
files = os.listdir(imageDir)
images = []
for file in files:
# Getting the full image name
filePath = os.path.abspath(os.path.join(imageDir, file))
try:
# explicit load to prevent resources crunch
fp = open(filePath, "rb")
im = Image.open(fp)
images.append(im)
# force loading the image data from file
im.load()
# close the file
fp.close()
except Exception:
# skip
print("Invalid image: %s" % (filePath,))
return images
def test_for_clean_data():
assert getImages("test_images") == [Image.open("test_images/01.jpg"),
Image.open("test_images/02.jpg"),
Image.open("test_images/03.jpg")]
<PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=2448x2765 at 0x1E48F5872C8>
<PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=2448x2765 at 0x1E48F5878C8>
As shown in the example error provided by the console, same image will have different properties when tested.
Function to test is PIL.Image based.
Perhaps, as someone suggested the test is flawed in its origin. If anyone knows a better way to pytest that the function is properly working, I would be more than happy to try a new idea. There's so much to learn.
Suggestions for correct test naming are also welcome.

Eyeballing the code, it looks like one potential reason your objects differ is that you call im.load() in getImages(), but not when you open your test images. Does this work? This is just a quick guess, I haven't tested it.
assert getImages("test_images") == [Image.open("test_images/01.jpg").load(),
Image.open("test_images/02.jpg").load(),
Image.open("test_images/03.jpg").load()]

Why ImageField in form always triggers "invalid_image"?

I've implemented ImageField to upload images using Pillow verification in Django 1.8. For some reason, I can't submit the form. It always raises this ValidationError in the form (but with FileField this would work):
Upload a valid image. The file you uploaded was either not an image or a corrupted image.
The weird part of all this is that the ImageField.check method seems to obtain correct MIME type! (see below)
WHAT I TRIED
I've tried with JPG, GIF, and PNG formats; none worked.
So I tried to print some variables in django.forms.fields.ImageField modifying the try statement that triggers this error, adding print statements for testing:
try:
# load() could spot a truncated JPEG, but it loads the entire
# image in memory, which is a DoS vector. See #3848 and #18520.
image = Image.open(file)
# verify() must be called immediately after the constructor.
damnit = image.verify()
print 'MY_LOG: verif=', damnit
# Annotating so subclasses can reuse it for their own validation
f.image = image
# Pillow doesn't detect the MIME type of all formats. In those
# cases, content_type will be None.
f.content_type = Image.MIME.get(image.format)
print 'MY_LOG: image_format=', image.format
print 'MY_LOG: content_type=', f.content_type
Then I submit a form again to trigger the error after running python manage.py runserver and obtain these lines:
MY_LOG: verif= None
MY_LOG: image_format= JPEG
MY_LOG: content_type= image/jpeg
Image is correctly identified by Pillow and the try statement is executed until it's last line... and still the except statement is triggered? It makes nosense!
Using the same tactic, I tried to obtain sone useful log from django.db.models.fields.files.ImageField and every of it's parents until Field to print errors lists... all of them empty!
MY QUESTION
Is there anything else I can try to spot what is triggering the ValidationError?
SOME CODE
models.py
class MyImageModel(models.Model):
# Using FileField instead would mean succesfull upload
the_image = models.ImageField(upload_to="public_uploads/", blank=True, null=True)
views.py
from django.views.generic.edit import CreateView
from django.forms.models import modelform_factory
class MyFormView(CreateView):
model = MyImageModel
form_class = modelform_factory(MyImageModel,
widgets={}, fields = ['the_image',])
EDIT:
After trying the tactic suggested by #Alasdair, I obtained this report from e.message:
cannot identify image file <_io.BytesIO object at 0x7f9d52bbc770>
However, the file is successfully uploaded even if I'm not allowed to submit the form. It looks like if, somehow, the path to image wasn't processed correctly (or something else that hinders the image loading in these lines).
I think something is probably failing on these lines (from django.forms.fields.ImageField):
# We need to get a file object for Pillow. We might have a path or we might
# have to read the data into memory.
if hasattr(data, 'temporary_file_path'):
file = data.temporary_file_path()
else:
if hasattr(data, 'read'):
file = BytesIO(data.read())
else:
file = BytesIO(data['content'])
If I explore what properties does this class BytesIO have, maybe I can extract some relevant information about the error...
EDIT2
data attribute arrives empty! Determining why won't be easy...

From django documentation:
Using an ImageField requires that Pillow is installed with support for the image formats you use. If you encounter a corrupt image error when you upload an image, it usually means that Pillow doesn’t understand its format. To fix this, install the appropriate library and reinstall Pillow.
So first, you should install Pillow, instead of PIL (pillow is an fork of PIL) and second, when installing, make sure that all libraries required for "understanding" by Pillow various image formats, are installed.
For list of dependencies, you can look into Pillow documentation.

After thinking a lot, analyzing the implied code and lots of trial-and-error, I tried to edit this line from the try / except block that I exposed in the question (in django.forms.fields.ImageField) like this:
# Before edition
image = Image.open(file)
# After my edition
image = Image.open(f)
This fixed my issue. Now everything works well and I can submit the form. Invalid files are correctly rejected by the corresponding ValidationError
MY GUESS ABOUT HOW COULD THIS HAPPEN
I'm not sure if I'm guessing right, but:
I think this worked because this line had an error naming the correct variable. In addition, using file as a variable name looks like a typo, because file seems to be reserved for an existing built-in.
If my guess is right, maybe I should report this issue to Django developers

OpenCV Python not opening images with imread()

I'm not entirely sure why this is happening but I am in the process of making a program and I am having tons of issues trying to get opencv to open images using imread. I keep getting errors saying that the image is 0px wide by 0px high. This isn't making much sense to me so I searched around on here and I'm not getting any answers from SO either.
I have taken about 20 pictures and they are all using the same device. Probably 8 of them actually open and work correctly, the rest don't. They aren't corrupted either because they open in other programs. I have triple checked the paths and they are using full paths.
Is anyone else having issues like this? All of my files are .jpgs and I am not seeing any problems on my end. Is this a bug or am I doing something wrong?
Here is a snippet of the code that I am using that is reproducing the error on my end.
imgloc = "F:\Kyle\Desktop\Coinjar\Test images\ten.png"
img = cv2.imread(imgloc)
cv2.imshow('img',img)
When I change the file I just adjust the name of the file itself the entire path doesn't change it just refuses to accept some of my images which are essentially the same ones.
I am getting this error from a later part of the code where I try to use img.shape
Traceback (most recent call last):
File "F:\Kyle\Desktop\Coinjar\CoinJar Test2.py", line 14, in <module>
height, width, depth = img.shape
AttributeError: 'NoneType' object has no attribute 'shape'
and I am getting this error when I try to show a window from the code snippet above.
Traceback (most recent call last):
File "F:\Kyle\Desktop\Coinjar\CoinJar Test2.py", line 11, in <module>
cv2.imshow('img',img)
error: ..\..\..\..\opencv\modules\highgui\src\window.cpp:261: error: (-215) size.width>0 && size.height>0 in function cv::imshow

Probably you have problem with special meaning of \ in text - like \t or \n
Use \\ in place of \
imgloc = "F:\\Kyle\\Desktop\\Coinjar\\Test images\\ten.png"
or use prefix r'' (and it will treat it as raw text without special codes)
imgloc = r"F:\Kyle\Desktop\Coinjar\Test images\ten.png"
EDIT:
Some modules accept even / like in Linux path
imgloc = "F:/Kyle/Desktop/Coinjar/Test images/ten.png"

From my experience, file paths that are too long (OS dependent) can also cause cv2.imread() to fail.
Also, when it does fail, it often fails silently, so it is hard to even realize that it failed, and usually something further the the code will be what sparks the error.
Hope this helps.

Faced the same problem on Windows: cv.imread returned None when reading jpg files from a subfolder. The same code and folder structure worked on Linux.
Found out that cv.imread processes the same jpg files, if they are in the same folder as the python file.
My workaround:
copy the image file to the python file folder
use this file in cv.imread
remove redundant image file
import os
import shutil
import cv2 as cv
image_dir = os.path.join('path', 'to', 'image')
image_filename = 'image.jpg'
full_image_path = os.path.join(image_dir, image_filename)
image = cv.imread(full_image_path)
if image is None:
shutil.copy(full_image_path, image_filename)
image = cv.imread(image_filename)
os.remove(image_filename)
...

I had i lot of trouble with cv.imread() not finding my Image. I think i tryed everything involving changing the path. The os.path.exists(file_path) function also gave me back a True.
I finaly solved the problem by loading the images with imageio.
img = imageio.imread('file_path')
This also loads the img in a numpy array and you can use funktions like cv.matchTemplate() on this object. But i would recomment if u are doing stuff with multiple images that you then read all of them with imageio because i found diffrences in the arrays produced by .imread() from the two libs (opencv, imageio) on a File both of them could open.
I hope i could help someone

Take care to :
try imread() with a reliable picture,
and the correct path in your context like (see Kyle772 answer). For me either //or \.
I lost a couple of hours trying with 2 images saved from a left click in a browser. As soon as I took a personal camera image, it works fine.
Spyder screen shot
#context windows10 / anaconda / python 3.2.0
import cv2
print(cv2.__version__) # 3.2.0
imgloc = "D:/violettes/Software/Central/test.jpg" #this path works fine.
# imgloc = "D:\\violettes\\Software\\Central\\test.jpg" this path works fine also.
#imgloc = "D:\violettes\Software\Central\test.jpg" #this path fails.
img = cv2.imread(imgloc)
height, width, channels = img.shape
print (height, width, channels)
python opencv image-loading imread

I know that the question is already answered but in case anybody still is not able to load images with imread. It may be because there are letters in the string path witch imread does not accept.
For exmaple umlauts and diacritical marks.

My suggestion for everyone facing the same problem is to try this:
cv2.imshow("image", img)
The img is keyword. Never forget.

When you get error like this AttributeError: 'NoneType' object has no attribute 'shape'
Try with new_image=image.copy

PIL: How to reopen an image after verifying?

I need open an image, verify the image, then reopen it (see last sentence of below quote from PIL docs)
im.verify()
Attempts to determine if the file is broken, without actually decoding
the image data. If this method finds any problems, it raises suitable
exceptions. This method only works on a newly opened image; if the
image has already been loaded, the result is undefined. Also, if you
need to load the image after using this method, you must reopen the
image file.
This is what I have in my code, where picture is a django InMemoryUploadedFile object:
img = Image.open(picture)
img.verify()
img = Image.open(picture)
The first two lines work fine, but I get the following error for the third line (where I'm attempting to "reopen" the image):
IOError: cannot identify image file
What is the proper way to reopen the image file, as the docs suggest?

This is no different than doing
f = open('x.png')
Image.open(f)
Image.open(f)
The code above does not work because PIL advances in the file while reading its first few bytes to (attempt to) identify its format. Trying to use a second Image.open in this situation will fail as noted because now the current position in the file is past its image's header. To confirm this, you can verify what f.tell() returns. To solve this issue you have to go back to the start of the file either by doing f.seek(0) between the two calls to Image.open, or closing and reopening the file.

Try doing a del img between the verify and second open.

How to check if a file is a valid image file?

I am currently using PIL.
from PIL import Image
try:
im=Image.open(filename)
# do stuff
except IOError:
# filename not an image file
However, while this sufficiently covers most cases, some image files like, xcf, svg and psd are not being detected. Psd files throws an OverflowError exception.
Is there someway I could include them as well?

I have just found the builtin imghdr module. From python documentation:
The imghdr module determines the type
of image contained in a file or byte
stream.
This is how it works:
>>> import imghdr
>>> imghdr.what('/tmp/bass')
'gif'
Using a module is much better than reimplementing similar functionality
UPDATE: imghdr is deprecated as of python 3.11

In addition to what Brian is suggesting you could use PIL's verify method to check if the file is broken.
im.verify()
Attempts to determine if the file is
broken, without actually decoding the
image data. If this method finds any
problems, it raises suitable
exceptions. This method only works on
a newly opened image; if the image has
already been loaded, the result is
undefined. Also, if you need to load
the image after using this method, you
must reopen the image file. Attributes

Additionally to the PIL image check you can also add file name extension check like this:
filename.lower().endswith(('.png', '.jpg', '.jpeg', '.tiff', '.bmp', '.gif'))
Note that this only checks if the file name has a valid image extension, it does not actually open the image to see if it's a valid image, that's why you need to use additionally PIL or one of the libraries suggested in the other answers.

A lot of times the first couple chars will be a magic number for various file formats. You could check for this in addition to your exception checking above.

One option is to use the filetype package.
Installation
python -m pip install filetype
Advantages
Fast: Does its work by loading only the first few bytes of your image (check on the magic number)
Supports different mime type: Images, Videos, Fonts, Audio, Archives.
Example
filetype >= 1.0.7
import filetype
filename = "/path/to/file.jpg"
if filetype.is_image(filename):
print(f"{filename} is a valid image...")
elif filetype.is_video(filename):
print(f"{filename} is a valid video...")
filetype <= 1.0.6
import filetype
filename = "/path/to/file.jpg"
if filetype.image(filename):
print(f"{filename} is a valid image...")
elif filetype.video(filename):
print(f"{filename} is a valid video...")
Additional information on the official repo: https://github.com/h2non/filetype.py

Update
I also implemented the following solution in my Python script here on GitHub.
I also verified that damaged files (jpg) frequently are not 'broken' images i.e, a damaged picture file sometimes remains a legit picture file, the original image is lost or altered but you are still able to load it with no errors. But, file truncation cause always errors.
End Update
You can use Python Pillow(PIL) module, with most image formats, to check if a file is a valid and intact image file.
In the case you aim at detecting also broken images, #Nadia Alramli correctly suggests the im.verify() method, but this does not detect all the possible image defects, e.g., im.verify does not detect truncated images (that most viewers often load with a greyed area).
Pillow is able to detect these type of defects too, but you have to apply image manipulation or image decode/recode in or to trigger the check. Finally I suggest to use this code:
from PIL import Image
try:
im = Image.load(filename)
im.verify() #I perform also verify, don't know if he sees other types o defects
im.close() #reload is necessary in my case
im = Image.load(filename)
im.transpose(Image.FLIP_LEFT_RIGHT)
im.close()
except:
#manage excetions here
In case of image defects this code will raise an exception.
Please consider that im.verify is about 100 times faster than performing the image manipulation (and I think that flip is one of the cheaper transformations).
With this code you are going to verify a set of images at about 10 MBytes/sec with standard Pillow or 40 MBytes/sec with Pillow-SIMD module (modern 2.5Ghz x86_64 CPU).
For the other formats xcf,.. you can use Imagemagick wrapper Wand, the code is as follows:
Check the Wand documentation: here, to installation: here
im = wand.image.Image(filename=filename)
temp = im.flip;
im.close()
But, from my experiments Wand does not detect truncated images, I think it loads lacking parts as greyed area without prompting.
I red that Imagemagick has an external command identify that could make the job, but I have not found a way to invoke that function programmatically and I have not tested this route.
I suggest to always perform a preliminary check, check the filesize to not be zero (or very small), is a very cheap idea:
import os
statfile = os.stat(filename)
filesize = statfile.st_size
if filesize == 0:
#manage here the 'faulty image' case

On Linux, you could use python-magic which uses libmagic to identify file formats.
AFAIK, libmagic looks into the file and tries to tell you more about it than just the format, like bitmap dimensions, format version etc.. So you might see this as a superficial test for "validity".
For other definitions of "valid" you might have to write your own tests.

You could use the Python bindings to libmagic, python-magic and then check the mime types. This won't tell you if the files are corrupted or intact but it should be able to determine what type of image it is.

Adapting from Fabiano and Tiago's answer.
from PIL import Image
def check_img(filename):
try:
im = Image.open(filename)
im.verify()
im.close()
im = Image.open(filename)
im.transpose(Image.FLIP_LEFT_RIGHT)
im.close()
return True
except:
print(filename,'corrupted')
return False
if not check_img('/dir/image'):
print('do something')

Extension of the image can be used to check image file as follows.
import os
for f in os.listdir(folderPath):
if (".jpg" in f) or (".bmp" in f):
filePath = os.path.join(folderPath, f)

format = [".jpg",".png",".jpeg"]
for (path,dirs,files) in os.walk(path):
for file in files:
if file.endswith(tuple(format)):
print(path)
print ("Valid",file)
else:
print(path)
print("InValid",file)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python Script to detect broken images - python

You are building a bad path with img=Image.open('/Users/ajinkyabobade/Desktop/2'+filename) Try the following instead (by adding / to the end of the directory path) img=Image.open('/Users/ajinkyabobade/Desktop/2/'+filename) or img=Image.open(os.path.join('/Users/ajinkyabobade/Desktop/2', filename))

I am getting an error that tells me that Image.load is not available. Image.open appears to work. I was also getting errors using: except (IOError, SyntaxError) as e: I just changed that to: except: and it worked fine.

Related

How can I pytest a function that creates a list of objects?

Why ImageField in form always triggers "invalid_image"?

OpenCV Python not opening images with imread()

PIL: How to reopen an image after verifying?

How to check if a file is a valid image file?

Categories

Resources