High performance (python) library for reading tiff files?

High performance (python) library for reading tiff files? - python

I am using code to read a .tiff file in order to calculate a fractal dimension. My code looks like this:
import matplotlib.pyplot as plt
raster = plt.imread('xyz.tif')
for i in range(x1, x2):
for j in range(y1, y2):
pixel = raster[i][j]
This works, but I have to read a lot of pixels so I would like this to be fast, and ideally minimize electricity usage given current events. Is there a better library than matplotlib for this purpose? For example, could using a library specialized for matrix operations such as pandas help? Additionally, would another language such as C have better performance than python?

Edit: #cgohlke in the comments and others have found that cv2 is slower than tifffile for large and/or compressed images. It is best you test the different options on realistic data for your application.
I have found cv2 to be the fastest library for this. Using 5000 128x128 uint16 tif images gives the following result:
import time
import matplotlib.pyplot as plt
t0 = time.time()
for file in files:
raster = plt.imread(file)
print(f'{time.time()-t0:.2f} s')
1.52 s
import time
from PIL import Image
t0 = time.time()
for file in files:
im = np.array(Image.open(file))
print(f'{time.time()-t0:.2f} s')
1.42 s
import time
import tifffile
t0 = time.time()
for file in files:
im = tifffile.imread(file)
print(f'{time.time()-t0:.2f} s')
1.25 s
import time
import cv2
t0 = time.time()
for file in files:
im = cv2.imread(file, cv2.IMREAD_UNCHANGED)
print(f'{time.time()-t0:.2f} s')
0.20 s
cv2 is a computer vision library written in c++, which as the other commenter mentioned is much faster than pure python. Note the cv2.IMREAD_UNCHANGED flag, otherwise cv2 will convert monochrome images to 8-bit rgb.

I am not sure which library is the fastest but I have very good experience with Pillow:
from PIL import Image
raster = Image.open('xyz.tif')
then you could convert it to a numpy array:
import numpy
pixels = numpy.array(raster)
I would need to see the rest of the code to be able to recommend any other libraries. As for the language C++ or C would have better performance as they are low level languages. So depends on how complex your operations are and how much data you need to process, C++ scripts were shown to be 10-200x faster(increasing with the complexity of calculations). Hope this helps if you have any further questions just ask.

Related

how to get an axial image from ct scan images. as i am using sliver07 dataset so it has 100+ images in single .mhd file

i using SLIVER07 dataset for liver segmentation task but i am stuck in reading that images.
import SimpleITK as sitk
import numpy as np
import matplotlib.pyplot as plt
# reading .mhd file from slive07 dataset
itkimage = sitk.ReadImage('liver-orig001.mhd')
ct_scan = sitk.GetArrayFromImage(itkimage)
plt.imshow(ct_scan[1])

You are trying to pass the entire 3D image volume to imshow. You could instead try:
plt.imshow(ct_scan[40,:,:])
Which will show the 40th slice.
Of interest might be the platipy library, available here, or just $ pip install platipy. The built-in image visualiser (based on matplotlib) is perfect for 3D image visualisation in python, and has lots of cool features.
A little demo:
from platipy.imaging import ImageVisualiser
img = sitk.ReadImage("image_filename.mhd")
vis = ImageVisualiser(img)
fig = vis.show()

how can I load a single tif image in parts into numpy array without loading the whole image into memory?

so There is a 4GB .TIF image that needs to be processed, as a memory constraint I can't load the whole image into numpy array so I need to load it lazily in parts from hard disk.
so basically I need and that needs to be done in python as the project requirement. I also tried looking for tifffile library in PyPi tifffile but I found nothing useful please help.

pyvips can do this. For example:
import sys
import numpy as np
import pyvips
image = pyvips.Image.new_from_file(sys.argv[1], access="sequential")
for y in range(0, image.height, 100):
area_height = min(image.height - y, 100)
area = image.crop(0, y, image.width, area_height)
array = np.ndarray(buffer=area.write_to_memory(),
dtype=np.uint8,
shape=[area.height, area.width, area.bands])
The access option to new_from_file turns on sequential mode: pyvips will only load pixels from the file on demand, with the restriction that you must read pixels out top to bottom.
The loop runs down the image in blocks of 100 scanlines. You can tune this, of course.
I can run it like this:
$ vipsheader eso1242a-pyr.tif
eso1242a-pyr.tif: 108199x81503 uchar, 3 bands, srgb, tiffload_stream
$ /usr/bin/time -f %M:%e ./sections.py ~/pics/eso1242a-pyr.tif
273388:479.50
So on this sad old laptop it took 8 minutes to scan a 108,000 x 82,000 pixel image and needed a peak of 270mb of memory.
What processing are you doing? You might be able to do the whole thing in pyvips. It's quite a bit quicker than numpy.

import pyvips
img = pyvips.Image.new_from_file("space.tif", access='sequential')
out = img.resize(0.01, kernel = "linear")
out.write_to_file("resied_image.jpg")
if you want to convert the file to other format have a smaller size this code will be enough and will help you do it without without any memory spike and in very less time...

Converting image folder to numpy array is consuming the entire RAM

I am trying to convert the celebA dataset(https://www.kaggle.com/jessicali9530/celeba-dataset) images folder into a numpy array for later to be converted into a .pkl file(for using the data as simply as mnist or cifar).
I am willing to find a better way of converting since this method is absolutely consuming the whole RAM.
from PIL import Image
import pickle
from glob import glob
import numpy as np
TARGET_IMAGES = "img_align_celeba/*.jpg"
def generate_dataset(glob_files):
dataset = []
for _, file_name in enumerate(sorted(glob(glob_files))):
img = Image.open(file_name)
pixels = list(img.getdata())
dataset.append(pixels)
return np.array(dataset)
celebAdata = generate_dataset(TARGET_IMAGES)
I am rather curious on how the mnist authors did this themselves but any approach that works is welcome.

You can transform any kind of data on the fly in Keras and load in memory one batch at the time during training.
See documentation, search for 'Example of using .flow_from_directory(directory)'.

get a numpy array from a sequence of images in a folder

I have a folder, say video1 with bunch of images in order frame_00.png, frame_01.png, ...
What I want is a 4D numpy array in the format (number of frames, w, h, 3)
This is what I did, but I think it is quite slow, is there any faster or more effecient method to achieve the same thing?
folder = "video1/"
import os
images = sorted(os.listdir(folder)) #["frame_00", "frame_01", "frame_02", ...]
from PIL import Image
import numpy as np
video_array = []
for image in images:
im = Image.open(folder + image)
video_array.append(np.asarray(im)) #.transpose(1, 0, 2))
video_array = np.array(video_array)
print(video_array.shape)
#(75, 50, 100, 3)

There's an older SO thread that goes into a great deal of detail (perhaps even a bit too much) on this very topic. Rather than vote to close this question as a dup, I'm going to give a quick rundown of that thread's top bullet points:
The fastest commonly available image reading function is imread from the cv2 package.
Reading the images in and then adding them to a plain Python list (as you are already doing) is the fastest approach for reading in a large number of images.
However, given that you are eventually converting the list of images to an array of images, every possible method of building up an array of images is almost exactly as fast as any other
Although, interestingly enough, if you take the approach of assigning images directly to a preallocated array, it actually matters which indices (ie which dimension) you assign to in terms of getting optimal performance.
So basically, you're not going to be able to get much faster while working in pure, single-threaded Python. You might get a boost from switching to cv2.imread (in place of PIL.Image.open).

PNG is an extremely slow format, so if you can use almost anything else, you'll see a big speedup.
For example, here's an opencv version of your program that gets the filenames from command-line args:
#!/usr/bin/python3
import sys
import cv2
import numpy as np
video_array = []
for filename in sys.argv[1:]:
im = cv2.imread(filename)
video_array.append(np.asarray(im))
video_array = np.array(video_array)
print(video_array.shape)
I can run it like this:
$ mkdir sample
$ for i in {1..100}; do cp ~/pics/k2.png sample/$i.png; done
$ time ./readframes.py sample/*.png
(100, 2048, 1450, 3)
real 0m6.063s
user 0m5.758s
sys 0m0.839s
So 6s to read 100 PNG images. If I try with TIFF instead:
$ for i in {1..100}; do cp ~/pics/k2.tif sample/$i.tif; done
$ time ./readframes.py sample/*.tif
(100, 2048, 1450, 3)
real 0m1.532s
user 0m1.060s
sys 0m0.843s
1.5s, so four times faster.
You might get a small speedup with pyvips:
#!/usr/bin/python3
import sys
import pyvips
import numpy as np
# map vips formats to np dtypes
format_to_dtype = {
'uchar': np.uint8,
'char': np.int8,
'ushort': np.uint16,
'short': np.int16,
'uint': np.uint32,
'int': np.int32,
'float': np.float32,
'double': np.float64,
'complex': np.complex64,
'dpcomplex': np.complex128,
}
# vips image to numpy array
def vips2numpy(vi):
return np.ndarray(buffer=vi.write_to_memory(),
dtype=format_to_dtype[vi.format],
shape=[vi.height, vi.width, vi.bands])
video_array = []
for filename in sys.argv[1:]:
vi = pyvips.Image.new_from_file(filename, access='sequential')
video_array.append(vips2numpy(vi))
video_array = np.array(video_array)
print(video_array.shape)
I see:
$ time ./readframes.py sample/*.tif
(100, 2048, 1450, 3)
real 0m1.360s
user 0m1.629s
sys 0m2.153s
Another 10% or so.
Finally, as other posters have said, you could load frames in parallel. That wouldn't help TIFF much, but it would certainly boost PNG.

Importing PNG for deep-learning Python

I am a novice coder, having been self-taught through codeacademy. I am wondering what is the easiest way to import png files to python (2.7.14) with the goal of using these files in a deep-learning program.
So far I have tried these two codes:
import scipy
from scipy import misc
import glob
for image_path in glob.glob("/E:/_SAMM_Projects/gemini_hikai_DM_hack_complete/export/contact_frames/boat/*.png"):
image = misc.imread(image_path)
print image.shape
print image.dtype
import scipy
from scipy import misc
import glob
import numpy
png = []
for image_path in glob.glob("/E:/_SAMM_Projects/gemini_hikai_DM_hack_complete/export/contact_frames/boat/*.png"):
png.append(misc.imread(image_path))
im = np.asarray(png)
print "importing done...", im.shape
based off templates I have found online, both do not seem to work

In the context of deep learning, I understand that you would like to read an image into a numpy array so you can use deep learning models (such as ConvNet) on it.
I suggest using OpenCV for your purpose.
import cv2
image = cv2.imread("yourimg.png")

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

High performance (python) library for reading tiff files? - python

Related

how to get an axial image from ct scan images. as i am using sliver07 dataset so it has 100+ images in single .mhd file

how can I load a single tif image in parts into numpy array without loading the whole image into memory?

Converting image folder to numpy array is consuming the entire RAM

get a numpy array from a sequence of images in a folder

Importing PNG for deep-learning Python

Categories

Resources