Saved webp images are 3x bigger than jpg in OpenCV

Saved webp images are 3x bigger than jpg in OpenCV - python

For some reason, on my Ubuntu 20.04 machine when I use OpenCV in Python like:
cv2.imwrite("myfile.webp", cv2image)
in my code, the file of 800x600 px has about 300 KiB while if I do:
cv2.imwrite("myfile.jpg", cv2image)
the output file of the same pixel size has about 100 KiB.
Why is that, if webp should have 25% less in size than jpg?
Or do I have to set some options first?
P.S. for png:
cv2.imwrite("myfile.png", cv2image)
the size is about 500 KiB.

Webp has 2 forms of saving data. Lossy (what JPEG does) where information get lost to reduce data, and lossless (what png does) with no data loss.
By default opencv uses its cv2.IMWRITE_WEBP_QUALITY to determin the quality. If this is set at 100 it means no compression (lossless)
https://docs.opencv.org/master/d8/d6a/group__imgcodecs__flags.html#gga292d81be8d76901bff7988d18d2b42aca7d3f848cc45d3138de0a2053d213a54a

SOLVED! It should be like this to work:
cv2.imwrite("myfile.webp", cv2image, [int(cv2.IMWRITE_WEBP_QUALITY), 20])
Now, the file has 4 kB ;D

Related

BigTiff to JPEG 2000 using Python

I am trying to convert large (+50 GB) BigTiff images into JPEG 2000 format for post analysis using Python. I have succeeded on converting the large BigTiff files into JPEG using libvips; however, libvips does not have direct support for JPEG 2000 - what a bummer
I have been able to write JPEG 2000 images using glymur but the problem with glymur is that writing JPEG 2000 images is currently limited to images that can fit in memory. Since my workstation haves only 8 GB of RAM it would be impossible to convert a +50 GB file into JPEG 2000.
If anyone could point me into the right direction on converting a BigTiff into JPEG 2000 efficiently using a RAM limited work station, I would like to hear about it.
Cheers,
-Frank

Larger than expected file sizes when saving a TIFF with OpenCV

I am creating a python program to load a .tiff image, crop out a selection from the image, and save the selection as a tiff. The dataset images are large, exceeding 1GB. I can successfully crop out what I need and save as a tiff, but the new image file sizes are much larger than what I expect and need.
Opening
I am using tifffile to open the image as a numpy array. OpenCV and PIL were not able to open the files due to size. I tried using OpenSlide as well, but encountered other problems down the road with read_region().
Cropping
The numpy array has the shape (height, width, 3), so I crop using something like large_image[top:bottom, left:right, :]. This works as intended.
Saving
Using cv2.imwrite() has resulted in the smallest file sizes thus far, but they are still much larger than they should be. PIL.Image.save() and TiffWriter from tifffile created even larger images.
Best results: Cropping 13 new images from a 250MB file - using only about 20% of the original image - gives me files totaling over 900MB. I would expect the total to be something like 50MB.
Note: The cropped .tiff files have correct dimensions. If the original is 200,000 x 50,000, then the cropped file will be, say, 8,000 x 3,000. Also, I am unable to open the original 250MB image using Preview on my Mac, but I can quickly open a 500MB cropped image created by my program when I save the image with TiffWriter (I can open files saved with opencv as well).
Summary of the code:
import tifffile
import cv2
import numpy as np
original_image = tifffile.imread('filepath') #original_image is a numpy array
#...calculations for top, bottom, etc...
cropped_image = original_image[top:bottom, left:right, :]
cv2.imwrite('output_filepath', cropped_image)
These 3 lines are all the IO that I use.
tl;dr - trying to load images, crop, and save new images as .tiff, but new file sizes are much larger than expected.

If you are on a Mac, homebrew is great and you can install libtiff and ImageMagick with:
brew install libtiff imagemagick
Then you can really start to understand what compression, number of bits/sample and data sizes/types using:
tiffinfo YOURINPUTFILE.TIF
tiffinfo YOUROUTPUTFILE.TIF
and:
magick identify -verbose YOURINPUTFILE.TIF
magick identify -verbose YOUROUTPUTFILE.TIF
If you want to see the two side-by-side, use:
magick identify -verbose YOURINPUTFILE.TIF > a
magick identify -verbose YOUROUTPUTFILE.TIF > b
opendiff a b

Python 3: How to blur a GeoTIFF image with color table?

I've got a GeoTIFF image that I need to make blurry by applying a smoothing filter. The image itself contains metadata that needs to be preserved. It has a bit-depth of 8 and uses a color table with 256 32-bit RGBA values to look up a color for each pixel, but in order for the resulting image to look smooth it will probably have to use a bit-depth of 24 or 32 and no color table, alternatively use jpeg compression. What may complicate this further is that the image is 23,899x18,330 pixels large, which is almost five times as large as the largest file PIL wants to open by default.
How can create the blurry version of this image in Python 3?
I have also tried using PIL to just open and save it again:
from PIL import Image
Image.MAX_IMAGE_PIXELS = 1000000000
im = Image.open(file_in)
im.save(file_out)
This code doesn't crash, and I get a new .tif file that is approximatelly as large as the original file, but when I try to open it in Windows Photo Viewer to look at it the application says it is corrupt, and it cannot be re-opened by PIL.
I have also tried using GDAL. When I try this code, I get an output image that is 835 MB large, which corresponds to an uncompressed image with a bit-depth of 16 (which is also what the file metadata says when I right-click on it and choose "Properties" – I'm using Windows 10). However, the resulting image is monochrome and very dark, and the colors look like they have been jumbled up, which makes me believe that the code I'm trying interprets the pixel values as intensity values and not as table keys.
So in order to make this method work, I need to figure out how to apply the color table (which is some sort of container for tuples, of type osgeo.gdal.ColorTable) to the raster band (whatever a raster band is), which is a numpy array with the shape (18330, 23899), to get a new numpy array with the shape (18330, 23899, 4) or (4, 18330, 23899) (don't know which is the correct shape), insert this back into the loaded image and remove the color table (or create a new one with the same metadata), and finally save the modified image with compression enabled (so I get closer to the original file size – 11.9 MB – rather than 835 MB which is the size of the file I get now). How can I do that?

pyvips can process huge images quickly using just a small amount of memory, and supports palette TIFF images.
Unfortunately it won't support the extra geotiff tags, since libtiff won't work on unknown tag types. You'd need to copy that metadata over in some other way.
Anyway, if you can do that, pyvips should work on your image. I tried this example:
import sys
import pyvips
# the 'sequential' hint tells libvips that we want to stream the image
# and don't need full random access to pixels ... in this mode,
# libvips can read, process and write in parallel, and without needing
# to hold the whole image in memory
image = pyvips.Image.new_from_file(sys.argv[1], access='sequential')
image = image.gaussblur(2)
image.write_to_file(sys.argv[2])
On an image of the type and size you have, generating a JPEG-compressed TIFF:
$ tiffinfo x2.tif
TIFF Directory at offset 0x1a1c65c6 (438068678)
Image Width: 23899 Image Length: 18330
Resolution: 45118.5, 45118.5 pixels/cm
Bits/Sample: 8
Compression Scheme: None
Photometric Interpretation: palette color (RGB from colormap)
...
$ /usr/bin/time -f %M:%e python3 ~/try/blur.py x2.tif x3.tif[compression=jpeg]
137500:2.42
So 140MB of memory, 2.5 seconds. The output image looks correct and is 24mb, so not too far off yours.

A raster band is just the name given to each "layer" of the image, in your case they will be the red, green, blue, and alpha values. These are what you want to blur. You can open the image and save each band to a separate array by using data.GetRasterBand(i) to get the ith band (with 1-indexing, not 0-indexing) of the image you opened using GDAL.
You can then try and use SciPy's scipy.ndimage.gaussian_filter to achieve the blurring. You'll want to send it an array that is shape (x,y), so you'll have to do this for each raster band individually. You should be able to save your data as another GeoTIFF using GDAL.
If the colour table you are working with means that your data is stored in each raster band in some odd format that isn't just floats between 0 and 1 for each of R, G, B, and A, then consider using scipy.ndimage.generic_filter, although without knowing how your data is stored it's hard to give specifics on how you'd do this.

How to convert an .eps file into a .png in Python 3.6

With Python turtle I'm trying to save the canvas as a png. I've researched it and found a way to save it as an eps file without any modules, but I'm finding it hard to convert the eps file into a png.
Is there a way to convert eps to png without downloading another module? If not can someone tell me more about ImageMagick, because I have looked at it, but I'm confused how to use it? I've also seen it being linked to linux and is it outdated?
If not converting eps to png, is there a even simpler way to save the canvas as a png?
Btw I have seen this, but I don't understand it :/
How to convert a .eps file to a high quality 1024x1024 .jpg?

From the link you show, there is this Imagemagick command:
convert -density 300 image.eps -resize 1024x1024 image.jpg
Most EPS files are vector images. They have no physical size in pixels, since it a vector drawing with commands that describe how to draw each object. It is not a raster image containing pixels and does not have any particular pixel set of dimensions.
So with vector files, you set the printing density to tell Imagemagick (which passes it off to Ghostscript to do the rasterizing work) to convert the vector data to raster data and then save it as a raster format output image. Nominal density is 72 dpi (sometimes 92 or 96). So if you use -density 288 with the following command:
convert -density 288 image.eps image.png
It would result in an image that is 4 times larger in each dimension than if you just did
convert image.eps image.png
which for default dpi of 72 would be the same as
convert -density 72 image.eps image.png
Note that 72*4=288.
Now you have a large high quality raster png, especially if the eps file was line drawing with thin lines like blue-prints.
However if that is too large and you want to reduce it back to its nominal size by 1/4, you could do (note 1/4 = 25%)
convert -density 288 image.eps -resize 25% image.png
This process is sometimes called supersampling and would produce a better looking result than just doing
convert image.eps image.png
In the original command, they decide to get a high quality raster image and just resize to 1024x1024.
So you can resize to any size you want after producing a high definition raster image from the EPS vector image.
The larger the density you use, the higher the quality will be in the PNG, but it will take longer to process. So you have to trade time vs quality and pick the smallest density that produces good enough quality in a reasonable amount of time.
I do not know if Python Wand supports setting the density or if it supports reading PDF file, which requires Ghostscript. But you can use Python Subprocess module to make a call to an Imagemagick command line. See https://www.imagemagick.org/discourse-server/viewtopic.php?f=4&t=32920

I've been having problems with ImageMagick having had its security policy changed so it can't interact with Ghostscript. (for good reason... but it's questionable that it doesn't allow you to locally override the default policy so web apps can be protected while whitelisted uses can still work.)
For anyone else slamming into convert: not authorized, here's how to invoke Ghostscript directly:
gs -dSAFER -dEPSCrop -r300 -sDEVICE=jpeg -o image.png image.eps
-dSAFER puts Ghostscript into sandboxed mode so you can interpret untrusted Postscript. (It should be default, but backwards compatibility.)
-dEPSCrop asks Ghostscript to not pad it out to the size of a printable page. (details)
ImageMagick's -density 300 becomes -r300 when it invokes Ghostscript. (details)
-sDEVICE is how you set the output format (See the Devices section of the manual for other choices.)
-o is a shorthand for -dBATCH -dNOPAUSE -sOutputFile= (details)
You could then use ImageMagick's mogrify command to resize it to fit exact pixel dimensions:
mogrify -resize 1024x1024 image.png
(mogrify is like convert but replaces the input file rather than writing to a new file.)
UPDATE: In hindsight, I should have checked whether Pillow supported EPS before posting that first answer.
The native Python solution would be to use Pillow's support for invoking Ghostscript (like ImageMagick in that respect, but with a native Python API).
However, Pillow's docs don't explain how they arrive at a size in pixels and only take an optional multiplier (scale) for the default size rather than an absolute DPI value.
If that doesn't bother you, and you've got both Pillow and Ghostscript installed, here's how to do it without ImageMagick:
#!/usr/bin/env python3
from PIL import Image
TARGET_BOUNDS = (1024, 1024)
# Load the EPS at 10 times whatever size Pillow thinks it should be
# (Experimentaton suggests that scale=1 means 72 DPI but that would
# make 600 DPI scale=8⅓ and Pillow requires an integer)
pic = Image.open('image.eps')
pic.load(scale=10)
# Ensure scaling can anti-alias by converting 1-bit or paletted images
if pic.mode in ('P', '1'):
pic = pic.convert("RGB")
# Calculate the new size, preserving the aspect ratio
ratio = min(TARGET_BOUNDS[0] / pic.size[0],
TARGET_BOUNDS[1] / pic.size[1])
new_size = (int(pic.size[0] * ratio), int(pic.size[1] * ratio))
# Resize to fit the target size
pic = pic.resize(new_size, Image.ANTIALIAS)
# Save to PNG
pic.save("image.png")

Python Pillow lowering image quality doesnt change file size

I am trying to lower the file size of an image using pillow (pil) however lowering the image quality doesn't lower the size of the saves image.
The saved images 'image2' and 'image3' are the same size.
import PIL from Image
im = Image.open('image.png')
im.save('image2.png', quality=100)
im.save('image3.png', quality=10)

The PNG format only supports lossless compression, for which the compression ratio is usually limited and not freely adjustable.
If I am right, there is a variable parameter that tells the compressor to spend more or less time finding a better compression scheme. But without a guarantee to succeed.

You have to use image compression to reduce sizes - pngquant or similar
https://pngquant.org/

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.