Opening and saving of bitmap images with python affects filesize - python

I have a 800x800 RGB bitmap, filesize is 2501 kilobyte, and do the following (using python 3.6):
(unfortunately i cannot share the image)
from PIL import Image
import numpy as np
im = Image.open('original_image.bmp')
im.save("test_size_manual.bmp", "BMP")
For some reason the new file is only 1876 KB. And even though the file size is different, the following holds:
import matplotlib.pylab as plt
original_image = plt.imread('original_image.bmp')
test_size_image = plt.imread('test_size_manual.bmp')
assert (original_image == test_size_image).all()
This means that pixel-for-pixel the resulting numpy.ndarray is the same. From a 'random' sampling of 800x800 bmp's found on google images most had the same file size as the new image, 1876 KB, but there also was at least one which had the same file size as the original image, 2501 KB.
What is causing this difference in filesize, or how would you go about finding out?

The answer is indeed found in the metadata.
The original image turns out to be a 32-bit bitmap and the new image is a 24-bit bitmap. This explains the difference in file size: 2501 * 3/4 is just under 1876.
At offset 28 (0x1c) of the binary the bit-depth is stored and for the original it was 32 and for the new image it was 24.
Reference: BMP file format on Wikipedia

Related

PIL - resizing an image - different numpy array

I am reading an image from S3 bucket, then resize the image and get the numpy array of the resized image, called "a". I also save the resized image and reopen it and get the numpy array of that called "b". My question is why a and b are different?
resp = s3.get_object(Bucket=event['bucket'], Key=event['image_keys'][0])
data = resp['Body']
image_as_bytes = io.BytesIO(data.read())
image = Image.open(image_as_bytes).convert('RGB').resize((299, 299),Image.NEAREST)
a = np.asarray(image)
image.save('IMAGE_58990004_110132026B_13d64039_resized_lambda.jpg')
b = np.asarray(Image.open('IMAGE_58990004_110132026B_13d64039_resized_lambda.jpg'))
Does ".save" changes the numpy array?
Assuming that image.save(...) uses the filename ending (.jpg) to pick a file format (I don't know if it does. but it seems reasonable), then you are saving as a JPEG file, and the JPEG compression algorithm is lossy, i.e, it discards some information to make the file smaller.
Try using a file format with lossless compression, such as PNG.

How to adjust Pillow EPS to JPG quality

I'm trying to convert EPS images to JPEG using Pillow. But the results are of low quality. I'm trying to use resize method, but it gets completely ignored. I set up the size of JPEG image as (3600, 4700), but the resulted image has (360, 470) size. My code is:
eps_image = Image.open('img.eps')
height = eps_image.height * 10
width = eps_image.width * 10
new_size = (height, width)
print(new_size) # prints (3600, 4700)
eps_image.resize(new_size, Image.ANTIALIAS)
eps_image.save(
'img.jpeg',
format='JPEG'
dpi=(9000, 9000),
quality=95)
UPD. Vasu Deo.S noticed one my error, and thanks to him the JPG image has become bigger, but quality is still low. I've tried different DPI, sizes, resample values for resize function, but the result does not change much. How can i make it better?
The problem is that PIL is a raster image processor, as opposed to a vector image processor. It "rasterises" vector images (such as your EPS file and SVG files) onto a grid when it opens them because it can only deal with rasters.
If that grid doesn't have enough resolution, you can never regain it. Normally, it rasterises at 100dpi, so if you want to make bigger images, you need to rasterise onto a larger grid before you even get started.
Compare:
from PIL import Image
eps_image = Image.open('image.eps')
eps_image.save('a.jpg')
The result is 540x720:
And this:
from PIL import Image
eps_image = Image.open('image.eps')
# Rasterise onto 4x higher resolution grid
eps_image.load(scale=4)
eps_image.save('a.jpg')
The result is 2160x2880:
You now have enough quality to resize however you like.
Note that you don't need to write any Python to do this at all - ImageMagick will do it all for you. It is included in most Linux distros and is available for macOS and Windows and you just use it in Terminal. The equivalent command is like this:
magick -density 400 input.eps -resize 800x600 -quality 95 output.jpg
It's because eps_image.resize(new_size, Image.ANTIALIAS) returns an resized copy of an image. Therefore you have to store it in a separate variable. Just change:-
eps_image.resize(new_size, Image.ANTIALIAS)
to
eps_image = eps_image.resize(new_size, Image.ANTIALIAS)
UPDATE:-
These may not solve the problem completely, but still would help.
You are trying to save your output image as a .jpeg, which is a
lossy compression format, therefore information is lost during the
compression/transformation (for the most part). Change the output
file extension to a lossless compression format like .png so that
data would not be compromised during compression. Also change
quality=95 to quality=100 in Image.save()
You are using Image.ANTIALIAS for resampling the image, which is
not that good when upscaling the image (it has been replaced by
Image.LANCZOS in newer version, the clause still exists for
backward compatibility). Try using Image.BICUBIC, which produces
quite favorable results (for the most part) when upscaling the image.

why does an image of size 9 MB on disk occupy 125 MB in RAM when loaded into numpy?

Link to the image in question
Let me reproduce the issue I'm facing.
from skimage import io
image = io.imread("https://github.com/thalishsajeed/PythonLearn/raw/master/Houston_Chronicle__May_19_2018_51.jpg")
print((image.nbytes/(1024*1024)))
Result: 125.87553691864014
So how is it that a 9.45 MB file blows upto 125 MB when loaded into a numpy array using skimage? (I was able to replicate same results using openCV's cv2.imread function as well)
I guess this has something to do with JPEG compression, however if anyone can provide a more detailed explanation i'd really appreciate it.
Because that is a JPG image file, which is a compressed image format. Your image resolution is 6633x6633 and it is a color image. Meaning that:
Number of pixels = 6633 * 6633
Total byte size = 6633 * 6633 * 3 (RGB, each color pixel is 1 byte)
~125 mb

Save 1 bit deep binary image in Python

I have a binary image in Python and I want to save it in my pc.
I need it to be a 1 bit deep png image once stored in my computer.
How can I do that? I tried with both PIL and cv2 but I'm not able to save it with 1 bit depth.
I found myself in a situation where I needed to create a lot of binary images, and was frustrated with the available info online. Thanks to the answers and comments here and elsewhere on SO, I was able to find an acceptable solution. The comment from #Jimbo was the best so far. Here is some code to reproduce my exploration of some ways to save binary images in python:
Load libraries and data:
from skimage import data, io, util #'0.16.2'
import matplotlib.pyplot as plt #'3.0.3'
import PIL #'6.2.1'
import cv2 #'4.1.1'
check = util.img_as_bool(data.checkerboard())
The checkerboard image from skimage has dimensions of 200x200. Without compression, as a 1-bit image it should be represented by (200*200/8) 5000 bytes
To save with skimage, note that the package will complain if the data is not uint, hence the conversion. Saving the image takes an average of 2.8ms and has a 408 byte file size
io.imsave('bw_skimage.png',util.img_as_uint(check),plugin='pil',optimize=True,bits=1)
Using matplotlib, 4.2ms and 693 byte file size
plt.imsave('bw_mpl.png',check,cmap='gray')
Using PIL, 0.5ms and 164 byte file size
img = PIL.Image.fromarray(check)
img.save('bw_pil.png',bits=1,optimize=True)
Using cv2, also complains about a bool input. The following command takes 0.4ms and results in a 2566 byte file size, despite the png compression...
_ = cv2.imwrite('bw_cv2.png', check.astype(int), [cv2.IMWRITE_PNG_BILEVEL, 1])
PIL was clearly the best for speed and file size.
I certainly missed some optimizations, comments welcome!
Use:
cv2.imwrite(<image_name>, img, [cv2.IMWRITE_PNG_BILEVEL, 1])
(this will still use compression, so in practice it will most likely have less than 1 bit per pixel)
If you're not loading pngs or anything the format does behave pretty reasonably to just write it. Then your code doesn't need PIL or any of the headaches of various imports and imports on imports etc.
import struct
import zlib
from math import ceil
def write_png_1bit(buf, width, height, stride=None):
if stride is None:
stride = int(ceil(width / 8))
raw_data = b"".join(
b'\x00' + buf[span:span + stride] for span in range(0, (height - 1) * stride, stride))
def png_pack(png_tag, data):
chunk_head = png_tag + data
return struct.pack("!I", len(data)) + chunk_head + struct.pack("!I", 0xFFFFFFFF & zlib.crc32(chunk_head))
return b"".join([
b'\x89PNG\r\n\x1a\n',
png_pack(b'IHDR', struct.pack("!2I5B", width, height, 1, 0, 0, 0, 0)),
png_pack(b'IDAT', zlib.compress(raw_data, 9)),
png_pack(b'IEND', b'')])
Adapted from:
http://code.activestate.com/recipes/577443-write-a-png-image-in-native-python/ (MIT)
by reading the png spec:
https://www.w3.org/TR/PNG-Chunks.html
Keep in mind the 1 bit data from buf, should be written left to right like the png spec wants in normal non-interlace mode (which we declared). And the excess data pads the final bit if it exists, and stride is the amount of bytes needed to encode a scanline. Also, if you want those 1 bit to have palette colors you'll have to write a PLTE block and switch the type to 3 rather than 0. Etc.

What is causing my Python program to run out of memory using opencv?

I wrote a program to read images using Python's opencv and tried to load 3 GB images, but the program aborted.
There is 32 GB of memory on my PC, but when I run this program it will run out of it. What is the cause?
The error message is not issued and the PC becomes abnormally heavy. I confirmed it with Ubuntu's System Monitor, and it ran out of memory and swap.
I import images into one array to pass to tensorflow deep learning program. The size of the images are 200 x 200 color images.
I use 64 bit version of Python.
import os
import numpy as np
import cv2
IMG_SIZE = 200
def read_images(path):
dirnames = sorted(os.listdir(path))
files = [sorted(os.listdir(path+dirnames[i]))\
for i in range(len(dirnames))]
i = 0
images = []
for fs in files:
tmp_images = []
for f in fs:
img = cv2.imread(path +dirnames[i] + "/" + f)
img = cv2.resize(img, (IMG_SIZE, IMG_SIZE))
img = img.flatten().astype(np.float32)/255.0
tmp_images.append(img)
i = i + 1
images.append(tmp_images)
return np.asarray(images)
Reasons for running out of memory:
Image file size and the size of corresponding array in memory are different. Images, e.g., PNG and JPEG formats, are compressed. The size of a corresponding uncompressed BMP image is more relevant here. Also, ndarray holds some meta-information that makes it a bit larger.
Converting to float32 from uint8 multiplies the size by 4. Try to avoid this if possible (I recognize uint8 imposes some limitations, like being unable to normalize and center the data).
Possible remedies:
Use numpy.memmap to create an array stored on disk
Reduce the quality of images, by converting to grayscale and/or reducing the resolution.
Train the model on a smaller number of images.

Categories

Resources