PIL vs Python-GD for crop and resize - python

I am creating custom images that I later convert to an image pyramid for Seadragon AJAX. The images and image pyramid are created using PIL. It currently take a few hours to generate the images and image pyramid for approximately 100 pictures that have a combined width and height of about 32,000,000 by 1000 (yes, the image is very long and narrow). The performance is roughly similar another algorithm I have tried (i.e. deepzoom.py). I plan to see if python-gd would perform better due to most of its functionality being coded in C (from the GD library). I would assume a significant performance increase however I am curious to hear the opinion of others. In particular the resizing and cropping is slow in PIL (w/ Image.ANTIALIAS). Will this improve considerable if I use Python-GD?
Thanks in advance for the comments and suggestions.
EDIT: The performance difference between PIL and python-GD seems minimal. I will refactor my code to reduce performance bottlenecks and include support for multiple processors. I've tested out the python 'multiprocessing' module. Results are encouraging.

PIL is mostly in C.
Antialiasing is slow. When you turn off antialiasing, what happens to the speed?

VIPS includes a fast deepzoom creator. I timed deepzoom.py and on my machine I see:
$ time ./wtc.py
real 0m29.601s
user 0m29.158s
sys 0m0.408s
peak RES 450mb
where wtc.jpg is a 10,000 x 10,000 pixel RGB JPG image, and wtc.py is using these settings.
VIPS is around three times faster and needs a quarter of the memory:
$ time vips dzsave wtc.jpg wtc --overlap 2 --tile-size 128 --suffix .png[compression=0]
real 0m10.819s
user 0m37.084s
sys 0m15.314s
peak RES 100mb
I'm not sure why sys is so much higher.

Related

PIL Open image, causing DecompressionBombError, with lower resolution

I have a problem_page such that
from PIL import Image
problem_page = "/home/rajiv/tmp/kd/pss-images/f1-577.jpg"
img = Image.open(problem_page)
results in
PIL.Image.DecompressionBombError: Image size (370390741 pixels) exceeds limit of 178956970 pixels, could be decompression bomb DOS attack.
I'd like to respect the limit and not increase the limit (as described here: Pillow in Python won't let me open image ("exceeds limit"))
How can I load it in a way that the resolution is lowered just below the limit and the lower resolution image is referenced in img without causing any error.
It'd be great to have a Python solution but if not, any other solution will work too.
Update(to answer questions in comments):
These images are derived from PDFs to do machine learning(ML). These PDFs come from outside the system. So we have to protect our system from possible decompression bombs. For most ML, pixel size requirements are well below the limit imposed by PIL so we are ok with that limit as a heuristic to protect us.
Our current option is to use pdf2image which converts pdfs to images and specify a pixel size (e.g. width=1700 pixels, height=2200 pixels) there but I was curious if this can be done at the point of loading an image.

What is the most CPU efficient way to resize big images in Python

I'm looking for most efficient way to resize images. PIL works good if images are relatively small (for example 3000x2000) but if resolution is big (16000x12000) it takes long time to process. Images don't have to look pretty I'm resizing them for comparison to find copies of image with nrmse.
from PIL import Image
img1 = Image.open("img1.jpg")
img2 = Image.open("img2.jpg")
print img1.size
print img2.size
# add width to height to see which resolution is bigger
im1s = img1.size[0] + img1.size[1]
im2s = img2.size[0] + img2.size[1]
# if both images are bigger than 3000 pixels make them smaller for comparison
if im1s > 3000 and im2s > 3000:
print("Width and height of both images is bigger than 3000 pixels resizing them for easier comparison")
im1_resize = img1.resize((640, 480), Image.ANTIALIAS)
im2_resize = img2.resize((640, 480), Image.ANTIALIAS)
im1_resize.save('im1r.jpg')
im2_resize.save('im2r.jpg')
You should pass the Image.NEAREST parameter when upsampling, i.e.:
im1_resize = img1.resize((640, 480), Image.NEAREST)
This will only take the closest pixel when upsampling, and thus is the fastest upsampling method.
When using ANTIALIAS multiple pixels are sampled to produce the resized image, which is much slower.
Note that most likely your bottleneck is writing out those files, and not upscaling though.
I have two recommendations. One is libvips and the other is jpegtran-cffi.
Since I did no benchmarking I just write down the points of each library.
Libvips supports a large range of image formats and gains its speed by smart decisions to allow multithreading and use of fast cpu instructions explained here and benchmarks. Also python version is here
Jpegtran works only on jpegs. They get their speed by directly operating on the jpeg data without recompressing the final output explained in the main readme together with a benchmark.
My guess is, that jpegtran is single threaded and would outperform libvip when it can run in a multiprocessing environment. The only comparison we have is the benchmark of pillow to which they both compare. Libvips outperforms it by a factor of 6 and jpegtran maybe by a factor of 2.

Loading many images in PyGame with limited RAM

I'm using PyGame on a Raspberry Pi, so I only have 512mb of RAM to work with. I have to load and display a lot of images in succession, though. I can't naively load all of these images into RAM as PyGame surfaces - I don't have enough RAM. The images themselves are fairly small, so I assume that PyGame surfaces are fairly big, and this is why I run out of RAM. I've tried loading from the disk every time I want to display an image, but that's obviously slow (noticeably so).
Is there a reasonable way to display lots of images in succession in PyGame with limited RAM - either by keeping the size in memory of the PyGame surface as low as possible, or some other way?
If you change your files to bmp, it should help. If you have really that little ram, then you should lower the resolution of your files using an image editor such as Preview or Paintbrush. Also, space might be saved through more efficient programming, such as putting objects in a list and just calling a list update.

Pygtk Image Loading too slow / load smaller

I am writing a file browser using pygtk. For image files I am showing some previews by loading images by pixbuf_new_from_file and scaling them. In directories with large files (like when browsing a portfolio) it takes too long. Is it possible to load the images with lower resolution?
Whole code can be found on Git. In dirFrame.py the function renderMainDirContent is the part that takes too long.
pixbuf_new_from_file_at_size seems to load full image and scale, as it has almost no effect on performance.
It seems like there is no faster way to do this with python. Using numpy to load and scale images improves performance, but you need to save thumbnails for acceptable performance, at least for large images.

implement image segmentation with python generator

Following the last question: read big image file as an array in python
Due to the memory limitation of my laptop, I would like to implement image segmentation algorithm with python generator which can read every pixel at a time, rather than the whole image.
My laptop is Window 7 (64 bit OS) with 4G ram and Intel(R) Core (TM) i7-2860 QM CPU, and the images I am processing are over 2G. The algorithm I want to apply is watershed segmentation: http://scikits-image.org/docs/dev/auto_examples/plot_watershed.html
The only similar example I can find is http://vkedco.blogspot.com/2012/04/rgb-to-gray-level-to-binary-python.html, but what I need is not just converting a pixel value at a time. I need to consider the relations among near pixels. How can I do?
Any idea or hint for me? Thanks in advance!
Since the RGB to graylevel conversion operation is purely local, a streaming approach is trivial; the position of the pixels is irrelevant. Watershed is a global operation. One pixel can change the output dramatically. You have several options:
Write an implementation of Watershed that works on tiles and iterates on many passes through the image. This sounds difficult to me.
Use a local method to segment (i.e. thresholding).
Get a computer with more RAM. RAM is cheap and you can stick tons of it into a desktop system.

Categories

Resources