How to check if a file is a valid image file? - python

I am currently using PIL.
from PIL import Image
try:
im=Image.open(filename)
# do stuff
except IOError:
# filename not an image file
However, while this sufficiently covers most cases, some image files like, xcf, svg and psd are not being detected. Psd files throws an OverflowError exception.
Is there someway I could include them as well?

I have just found the builtin imghdr module. From python documentation:
The imghdr module determines the type
of image contained in a file or byte
stream.
This is how it works:
>>> import imghdr
>>> imghdr.what('/tmp/bass')
'gif'
Using a module is much better than reimplementing similar functionality
UPDATE: imghdr is deprecated as of python 3.11

In addition to what Brian is suggesting you could use PIL's verify method to check if the file is broken.
im.verify()
Attempts to determine if the file is
broken, without actually decoding the
image data. If this method finds any
problems, it raises suitable
exceptions. This method only works on
a newly opened image; if the image has
already been loaded, the result is
undefined. Also, if you need to load
the image after using this method, you
must reopen the image file. Attributes

Additionally to the PIL image check you can also add file name extension check like this:
filename.lower().endswith(('.png', '.jpg', '.jpeg', '.tiff', '.bmp', '.gif'))
Note that this only checks if the file name has a valid image extension, it does not actually open the image to see if it's a valid image, that's why you need to use additionally PIL or one of the libraries suggested in the other answers.

A lot of times the first couple chars will be a magic number for various file formats. You could check for this in addition to your exception checking above.

One option is to use the filetype package.
Installation
python -m pip install filetype
Advantages
Fast: Does its work by loading only the first few bytes of your image (check on the magic number)
Supports different mime type: Images, Videos, Fonts, Audio, Archives.
Example
filetype >= 1.0.7
import filetype
filename = "/path/to/file.jpg"
if filetype.is_image(filename):
print(f"{filename} is a valid image...")
elif filetype.is_video(filename):
print(f"{filename} is a valid video...")
filetype <= 1.0.6
import filetype
filename = "/path/to/file.jpg"
if filetype.image(filename):
print(f"{filename} is a valid image...")
elif filetype.video(filename):
print(f"{filename} is a valid video...")
Additional information on the official repo: https://github.com/h2non/filetype.py

Update
I also implemented the following solution in my Python script here on GitHub.
I also verified that damaged files (jpg) frequently are not 'broken' images i.e, a damaged picture file sometimes remains a legit picture file, the original image is lost or altered but you are still able to load it with no errors. But, file truncation cause always errors.
End Update
You can use Python Pillow(PIL) module, with most image formats, to check if a file is a valid and intact image file.
In the case you aim at detecting also broken images, #Nadia Alramli correctly suggests the im.verify() method, but this does not detect all the possible image defects, e.g., im.verify does not detect truncated images (that most viewers often load with a greyed area).
Pillow is able to detect these type of defects too, but you have to apply image manipulation or image decode/recode in or to trigger the check. Finally I suggest to use this code:
from PIL import Image
try:
im = Image.load(filename)
im.verify() #I perform also verify, don't know if he sees other types o defects
im.close() #reload is necessary in my case
im = Image.load(filename)
im.transpose(Image.FLIP_LEFT_RIGHT)
im.close()
except:
#manage excetions here
In case of image defects this code will raise an exception.
Please consider that im.verify is about 100 times faster than performing the image manipulation (and I think that flip is one of the cheaper transformations).
With this code you are going to verify a set of images at about 10 MBytes/sec with standard Pillow or 40 MBytes/sec with Pillow-SIMD module (modern 2.5Ghz x86_64 CPU).
For the other formats xcf,.. you can use Imagemagick wrapper Wand, the code is as follows:
Check the Wand documentation: here, to installation: here
im = wand.image.Image(filename=filename)
temp = im.flip;
im.close()
But, from my experiments Wand does not detect truncated images, I think it loads lacking parts as greyed area without prompting.
I red that Imagemagick has an external command identify that could make the job, but I have not found a way to invoke that function programmatically and I have not tested this route.
I suggest to always perform a preliminary check, check the filesize to not be zero (or very small), is a very cheap idea:
import os
statfile = os.stat(filename)
filesize = statfile.st_size
if filesize == 0:
#manage here the 'faulty image' case

On Linux, you could use python-magic which uses libmagic to identify file formats.
AFAIK, libmagic looks into the file and tries to tell you more about it than just the format, like bitmap dimensions, format version etc.. So you might see this as a superficial test for "validity".
For other definitions of "valid" you might have to write your own tests.

You could use the Python bindings to libmagic, python-magic and then check the mime types. This won't tell you if the files are corrupted or intact but it should be able to determine what type of image it is.

Adapting from Fabiano and Tiago's answer.
from PIL import Image
def check_img(filename):
try:
im = Image.open(filename)
im.verify()
im.close()
im = Image.open(filename)
im.transpose(Image.FLIP_LEFT_RIGHT)
im.close()
return True
except:
print(filename,'corrupted')
return False
if not check_img('/dir/image'):
print('do something')

Extension of the image can be used to check image file as follows.
import os
for f in os.listdir(folderPath):
if (".jpg" in f) or (".bmp" in f):
filePath = os.path.join(folderPath, f)

format = [".jpg",".png",".jpeg"]
for (path,dirs,files) in os.walk(path):
for file in files:
if file.endswith(tuple(format)):
print(path)
print ("Valid",file)
else:
print(path)
print("InValid",file)

Related

Python Script to detect broken images

I wrote a python script to detect broken images and count them,
The problem in my script is it detects all the images and does not detect broken images. How to fix this. I refered :
How to check if a file is a valid image file? for my code
My code
import os
from os import listdir
from PIL import Image
count=0
for filename in os.listdir('/Users/ajinkyabobade/Desktop/2'):
if filename.endswith('.JPG'):
try:
img=Image.open('/Users/ajinkyabobade/Desktop/2'+filename)
img.verify()
except(IOError,SyntaxError)as e:
print('Bad file : '+filename)
count=count+1
print(count)
I have added another SO answer here that extends the PIL solution to better detect broken images.
I also implemented this solution in my Python script here on GitHub.
I also verified that damaged files (jpg) frequently are not 'broken' images i.e, a damaged picture file sometimes remains a legit picture file, the original image is lost or altered but you are still able to load it.
I quote the other answer for completeness:
You can use Python Pillow(PIL) module, with most image formats, to check if a file is a valid and intact image file.
In the case you aim at detecting also broken images, #Nadia Alramli correctly suggests the im.verify() method, but this does not detect all the possible image defects, e.g., im.verify does not detect truncated images (that most viewer often load with a greyed area).
Pillow is able to detect these type of defects too, but you have to apply image manipulation or image decode/recode in or to trigger the check. Finally I suggest to use this code:
try:
im = Image.load(filename)
im.verify() #I perform also verify, don't know if he sees other types o defects
im.close() #reload is necessary in my case
im = Image.load(filename)
im.transpose(PIL.Image.FLIP_LEFT_RIGHT)
im.close()
except:
#manage excetions here
In case of image defects this code will raise an exception.
Please consider that im.verify is about 100 times faster than performing the image manipulation (and I think that flip is one of the cheaper transformations).
With this code you are going to verify a set of images at about 10 MBytes/sec (modern 2.5Ghz x86_64 CPU).
For the other formats psd,xcf,.. you can use Imagemagick wrapper Wand, the code is as follows:
im = wand.image.Image(filename=filename)
temp = im.flip;
im.close()
But, from my experiments Wand does not detect truncated images, I think it loads lacking parts as greyed area without prompting.
I red that Imagemagick has an external command identify that could make the job, but I have not found a way to invoke that function programmatically and I have not tested this route.
I suggest to always perform a preliminary check, check the filesize to not be zero (or very small), is a very cheap idea:
statfile = os.stat(filename)
filesize = statfile.st_size
if filesize == 0:
#manage here the 'faulty image' case
You are building a bad path with
img=Image.open('/Users/ajinkyabobade/Desktop/2'+filename)
Try the following instead (by adding / to the end of the directory path)
img=Image.open('/Users/ajinkyabobade/Desktop/2/'+filename)
or
img=Image.open(os.path.join('/Users/ajinkyabobade/Desktop/2', filename))
try the below: It worked fine for me. It identifies the bad/corrupted image and remove them as well. Or if you want you can only print the bad/corrupted file name and remove the final script to delete the file.
for filename in listdir('/Users/ajinkyabobade/Desktop/2/'):
if filename.endswith('.JPG'):
try:
img = Image.open('/Users/ajinkyabobade/Desktop/2/'+filename) # open the image file
img.verify() # verify that it is, in fact an image
except (IOError, SyntaxError) as e:
print(filename)
os.remove('/Users/ajinkyabobade/Desktop/2/'+filename)
I am getting an error that tells me that Image.load is not available. Image.open appears to work.
I was also getting errors using:
except (IOError, SyntaxError) as e:
I just changed that to:
except:
and it worked fine.

Image format on imgur

I'm playing around in python trying to download some images from imgur. I've been using the urrlib and urllib.retrieve but you need to specify the extension when saving the file. This isn't a problem for most posts since the link has for example .jpg in it, but I'm not sure what to do when the extension isn't there. My question is if there is any way to determine the image format of the file before downloading it. The question is mostly imgur specific, but I wouldn't mind a solution for most image-hosting sites.
Thanks in advance
You can use imghdr.what(filename[, h]) in Python 2.7 and Python 3 to determine the image type.
Read here for more info, if you're using Python 2.7.
Read here for more info, if you're using Python 3.
Assuming the picture has no file extension, there's no way to determine which type it is before you download it. All image formats sets their initial bytes to a particular value. To inspect these 'magic' initial bytes check out https://github.com/ahupp/python-magic - it matches the initial bytes against known image formats.
The code below downloads a picture from imgur and determines which file type it is.
import magic
import requests
import shutil
r = requests.get('http://i.imgur.com/yed5Zfk.gif', stream=True) ##Download picture
if r.status_code == 200:
with open('~/Desktop/picture', 'wb') as f:
r.raw.decode_content = True
shutil.copyfileobj(r.raw, f)
print magic.from_file('~/Desktop/picture') ##Determine type
## Prints: 'GIF image data, version 89a, 360 x 270'

Why ImageField in form always triggers "invalid_image"?

I've implemented ImageField to upload images using Pillow verification in Django 1.8. For some reason, I can't submit the form. It always raises this ValidationError in the form (but with FileField this would work):
Upload a valid image. The file you uploaded was either not an image or a corrupted image.
The weird part of all this is that the ImageField.check method seems to obtain correct MIME type! (see below)
WHAT I TRIED
I've tried with JPG, GIF, and PNG formats; none worked.
So I tried to print some variables in django.forms.fields.ImageField modifying the try statement that triggers this error, adding print statements for testing:
try:
# load() could spot a truncated JPEG, but it loads the entire
# image in memory, which is a DoS vector. See #3848 and #18520.
image = Image.open(file)
# verify() must be called immediately after the constructor.
damnit = image.verify()
print 'MY_LOG: verif=', damnit
# Annotating so subclasses can reuse it for their own validation
f.image = image
# Pillow doesn't detect the MIME type of all formats. In those
# cases, content_type will be None.
f.content_type = Image.MIME.get(image.format)
print 'MY_LOG: image_format=', image.format
print 'MY_LOG: content_type=', f.content_type
Then I submit a form again to trigger the error after running python manage.py runserver and obtain these lines:
MY_LOG: verif= None
MY_LOG: image_format= JPEG
MY_LOG: content_type= image/jpeg
Image is correctly identified by Pillow and the try statement is executed until it's last line... and still the except statement is triggered? It makes nosense!
Using the same tactic, I tried to obtain sone useful log from django.db.models.fields.files.ImageField and every of it's parents until Field to print errors lists... all of them empty!
MY QUESTION
Is there anything else I can try to spot what is triggering the ValidationError?
SOME CODE
models.py
class MyImageModel(models.Model):
# Using FileField instead would mean succesfull upload
the_image = models.ImageField(upload_to="public_uploads/", blank=True, null=True)
views.py
from django.views.generic.edit import CreateView
from django.forms.models import modelform_factory
class MyFormView(CreateView):
model = MyImageModel
form_class = modelform_factory(MyImageModel,
widgets={}, fields = ['the_image',])
EDIT:
After trying the tactic suggested by #Alasdair, I obtained this report from e.message:
cannot identify image file <_io.BytesIO object at 0x7f9d52bbc770>
However, the file is successfully uploaded even if I'm not allowed to submit the form. It looks like if, somehow, the path to image wasn't processed correctly (or something else that hinders the image loading in these lines).
I think something is probably failing on these lines (from django.forms.fields.ImageField):
# We need to get a file object for Pillow. We might have a path or we might
# have to read the data into memory.
if hasattr(data, 'temporary_file_path'):
file = data.temporary_file_path()
else:
if hasattr(data, 'read'):
file = BytesIO(data.read())
else:
file = BytesIO(data['content'])
If I explore what properties does this class BytesIO have, maybe I can extract some relevant information about the error...
EDIT2
data attribute arrives empty! Determining why won't be easy...
From django documentation:
Using an ImageField requires that Pillow is installed with support for the image formats you use. If you encounter a corrupt image error when you upload an image, it usually means that Pillow doesn’t understand its format. To fix this, install the appropriate library and reinstall Pillow.
So first, you should install Pillow, instead of PIL (pillow is an fork of PIL) and second, when installing, make sure that all libraries required for "understanding" by Pillow various image formats, are installed.
For list of dependencies, you can look into Pillow documentation.
After thinking a lot, analyzing the implied code and lots of trial-and-error, I tried to edit this line from the try / except block that I exposed in the question (in django.forms.fields.ImageField) like this:
# Before edition
image = Image.open(file)
# After my edition
image = Image.open(f)
This fixed my issue. Now everything works well and I can submit the form. Invalid files are correctly rejected by the corresponding ValidationError
MY GUESS ABOUT HOW COULD THIS HAPPEN
I'm not sure if I'm guessing right, but:
I think this worked because this line had an error naming the correct variable. In addition, using file as a variable name looks like a typo, because file seems to be reserved for an existing built-in.
If my guess is right, maybe I should report this issue to Django developers

Python determine file type from http form submission

I want to do an image form submission, and I want to validate that the image was submitted is an image server side, which is running python. Is there a simple way to do this in pure python?
A simple and naive way to do it would be with libmagic (for example the one at https://github.com/ahupp/python-magic). A better way, but it's not native Python and is a very extensive library, would be to use PIL http://www.pythonware.com/products/pil/.
Use PIL:
import sys
import Image
for infile in sys.argv[1:]:
try:
im = Image.open(infile)
print infile, im.format, "%dx%d" % im.size, im.mode
except IOError:
pass
From the docs:
The Python Imaging Library supports a wide variety of image file
formats. To read files from disk, use the open function in the Image
module. You don't have to know the file format to open a file. The
library automatically determines the format based on the contents of
the file.

Can't apply image filters on 16-bit TIFs in PIL

I try to apply image filters using python's PIL. The code is straight forward:
im = Image.open(fnImage)
im = im.filter(ImageFilter.BLUR)
This code works as expected on PNGs, JPGs and on 8-bit TIFs. However, when I try to apply this code on 16-bit TIFs, I get the following error
ValueError: image has wrong mode
Note that PIL was able to load, resize and save 16-bit TIFs without complains, so I assume that this problem is filter-related. However, ImageFilter documentation says nothing about 16-bit support
Is there any way to solve it?
Your TIFF image's mode is most likely a "I;16".
In the current version of ImageFilter, kernels can only be applied to
"L" and "RGB" images (see source of ImageFilter.py)
Try converting first to another mode:
im.convert('L')
If it fails, try:
im.mode = 'I'
im = im.point(lambda i:i*(1./256)).convert('L').filter(ImageFilter.BLUR)
Remark: Possible duplicate from Python and 16 Bit Tiff
To move ahead, try using ImageMagick, look for PythonMagick hooks to the program. On the command prompt, you can use convert.exe image-16.tiff -blur 2x2 output.tiff. Didn't manage to install PythonMagick in my windows OS as the source needs compiling.

Categories

Resources