Joining multiple huge images with pyvips

Joining multiple huge images with pyvips - python

I'm trying to figure out how to join multiple images with vips via python. I have in a folder lets say 30 ( but can be more than 600 ) png files that are stripes, they have resolution 854x289920 ( all the same resolution )...
PIL in python will immediately die if I try join them together horizontally with MemmoryError. So I google around and found VIPS which can do both things I need join the images and make deep zoom image from result.
Unfortunately I'm not sure how to correctly join them horizontally in python.
I have in array a list of images from folder, but how would I loop through them and sequentially write joined image out to disk ?

Just for reference, you can also do this at the command line. Try:
vips arrayjoin "a.png b.png c.png" mypyr.dz --across 3
Will join three PNG images horizontally and save the result as a deepzoom pyramid called mypyr. The arrayjoin docs have all of the options:
https://www.libvips.org/API/current/libvips-conversion.html#vips-arrayjoin
You can give the pyramid builder parameters by enclosing them in square brackets after the .dz.
vips arrayjoin "a.png b.png c.png" mypyr.dz[overlap=0,container=zip] --across 3
On Windows, deepzoom pyramids can be very slow to write since Windows hates creating files, and hates huge directories. If you write with container=zip, vips will directly create a .zip file containing the pyramid. This makes pyramid creation around 4x faster.

This also seems to be working fine for opening large number of images and doing joinarray on them so they are next to each other. Thanks #user894763
import os
import pyvips
# natsort helps with sorting the list logically
from natsort import natsorted
source = r"E:/pics/"
output = r"E:/out/"
save_to = output + 'final' + '.tif'
# define list of pictures we are going to get from folder
list_of_pictures = []
# get the
for x in os.listdir(source):
list_of_pictures.append(source + x)
# list_of_pictures now contains all the images from folder including full path
# since os.listdir will not guarantee proper order of files we use natsorted to do it
list_of_pictures = natsorted(list_of_pictures)
array_images = []
image = None
# lets create array of the images for joining, using sequential so it use less ram
for i in list_of_pictures:
tile = pyvips.Image.new_from_file(i, access="sequential")
array_images.append(tile)
# Join them, across is how many pictures there be next to each other, so i just counted all pictures in array with len
out = pyvips.Image.arrayjoin(array_images, across=len(list_of_pictures))
# write it out to file....
out.write_to_file(save_to, Q=95, compression="lzw", bigtiff=True)

I figured it out with some questions still:
import pyvips
list_of_pictures = []
for x in os.listdir(source):
list_of_pictures.append(source + x)
image = None
for i in list_of_pictures:
tile = pyvips.Image.new_from_file(i, access="sequential")
image = tile if not image else image.join(tile, "horizontal")
image.write_to_file(save_to)
which yes, produce tif with joined pictures... but holly cow ! the original pictures are png (30x ) all together 4.5GB the result tiff is 25GB ! what gives, why is there such a huge size difference ?

Related

How to create a list of DICOM files and convert it to a single numpy array .npy?

I have a problem and don't know how to solve:
I'm learning how to analyze DICOM files with Python and, so,
I got a patient exam, on single patient and one single exam, which is 200 DICOM files all of the size 512x512 each archive representing a different layer of him and I want to turn them into a single archive .npy so I can use in another tutorial that I found online.
Many tutorials try to convert them to jpg or png using opencv first, but I don't want this since I'm not interested in a friendly image to see right now, I need the array. Also, this step screw all the quality of images.
I already know that using:
medical_image = pydicom.read_file(file_path)
image = medical_image.pixel_array
I can grab the path, turn 1 slice in a pixel array and them use it, but the thing is, it doesn't work in a for loop.
The for loop I tried was basically this:
image = [] # to create an empty list
for f in glob.iglob('file_path'):
img = pydicom.dcmread(f)
image.append(img)
It results in a list with all the files. Until here it goes well, but it seems it's not the right way, because I can use the list and can't find the supposed next steps anywhere, not even answers to the errors that I get in this part, (so I concluded it was wrong)

The following code snippet allows to read DICOM files from a folder dir_path and to store them into a list. Actually, the list does not consist of the raw DICOM files, but is filled with NumPy arrays of Hounsfield units (by using the apply_modality_lut function).
import os
from pathlib import Path
import pydicom
from pydicom.pixel_data_handlers import apply_modality_lut
dir_path = r"path\to\dicom\files"
dicom_set = []
for root, _, filenames in os.walk(dir_path):
for filename in filenames:
dcm_path = Path(root, filename)
if dcm_path.suffix == ".dcm":
try:
dicom = pydicom.dcmread(dcm_path, force=True)
except IOError as e:
print(f"Can't import {dcm_path.stem}")
else:
hu = apply_modality_lut(dicom.pixel_array, dicom)
dicom_set.append(hu)

You were well on your way. You just have to build up a volume from the individual slices that you read in. This code snippet will create a pixelVolume of dimension 512x512x200 if your data is as advertised.
import dicom
import numpy
images = [] # to create an empty list
# Read all of the DICOM images from file_path into list "images"
for f in glob.iglob('file_path'):
image = pydicom.dcmread(f)
images.append(image)
# Use the first image to determine the number of rows and columns
repImage = images[0]
rows=int(repImage.Rows)
cols=int(repImage.Columns)
slices=len(images)
# This tuple represents the dimensions of the pixel volume
volumeDims = (rows, cols, slices)
# allocate storage for the pixel volume
pixelVolume = numpy.zeros(volumeDims, dtype=repImage.pixel_array.dtype)
# fill in the pixel volume one slice at a time
for image in images:
pixelVolume[:,:,i] = image.pixel_array
#Use pixelVolume to do something interesting
I don't know if you are a DICOM expert or a DICOM novice, but I am just accepting your claim that your 200 images make sense when interpreted as a volume. There are many ways that this may fail. The slices may not be in expected order. There may be multiple series in your study. But I am guessing you have a "nice" DICOM dataset, maybe used for tutorials, and that this code will help you take a step forward.

How do I read and label images stored in two different directories for binary classification?

The two directories I have contain many images, each image having a unique name. One directory is healthy, and another one is unhealthy.
I need to read images from both the directories, and label the images from one directory as 1, and 0 from another directory, then create a numpy array with each each image mapped to its corresponding label.
healthy = glob.glob('/path to healthy directory/' + '*.nii.gz')
unhealthy = glob.glob('path to unhealthy directory' + '*.nii.gz')
images = []
for i in healthy:
img = nib.load(i).get_fdata()
images.append(img)
for i in unhealthy:
img = nib.load(i).get_fdata()
images.append(img)

Use glob.glob(/path/to/directory/**.<extension>) to get a list of images in a directory. You might need to use more than one call if there are multiple file extensions.
Once you have a list of images, you can do
np.concatenate(np.array([images]), np.ones((1, len(images))))
or
np.concatenate(np.array([images]), np.zeros((1, len(images))))
as appropriate.

There's a good tutorial here that addresses this, specifically the function you're likely looking for is this one. You have to use labels=inferred and name your directories based on the classes you're dealing with. Also, you likely don't need to use numpy arrays. You can probably use functions from here to do what you need to do. Best of luck!

Is there any way to use arithmetic ops on FITS files in Python?

I'm fairly new to Python, and I have been trying to recreate a working IDL program to Python, but I'm stuck and keep getting errors. I haven't been able to find a solution yet.
The program requires 4 FITS files in total (img and correctional images dark, flat1, flat2). The operations are as follows:
flat12 = (flat1 + flat2)/2
img1 = (img - dark)/flat12
The said files have dimensions (1024,1024,1). I have resized them to (1024,1024) to be able to even use im_show() function.
I have also tried using cv2.add(), but I get this:
TypeError: Expected Ptr for argument 'src1'
Is there any workaround for this? Thanks in advance.

To read your FITS files use astropy.io.fits: http://docs.astropy.org/en/latest/io/fits/index.html
This will give you Numpy arrays (and FITS headers if needed, there are different ways to do this, as explained in the documentation), so you could do something like:
>>> from astropy.io import fits
>>> img = fits.getdata('image.fits', ext=0) # extension number depends on your FITS files
>>> dark = fits.getdata('dark.fits') # by default it reads the first "data" extension
>>> darksub = img - dark
>>> fits.writeto('out.fits', darksub) # save output
If your data has an extra dimension, as shown with the (1024,1024,1) shape, and if you want to remove that axis, you can use the normal Numpy array slicing syntax: darksub = img[0] - dark[0].
Otherwise in the example above it will produce and save a (1024,1024,1) image.

Image segmentation using corresponding masks in python

I have corresponding masks to the images that I want to segment.
I put the images in one folder and their corresponding masks in another folder.
I'm trying to apply those masks or multiply them by the images using two for loops in python to get the segmented images.
I'm using the code below:
def ImageSegmentation():
SegmentedImages = []
for img_path in os.listdir('C:/Users/mab/Desktop/images/'):
img=io.imread('C:/Users/mab/Desktop/data/'+img_path)
for img_path2 in os.listdir('C:/Users/mab/Desktop/masks/'):
Mask = io.imread('C:/Users/mab/Desktop/masks/'+img_path2)
[indx, indy] = np.where(Mask==0)
Color_Masked = img.copy()
Color_Masked[indx,indy] = 0
matplotlib.image.imsave('C:/Users/mab/Desktop/SegmentedImages/'+img_path2,Color_Masked)
segs.append(Color_Masked)
return np.vstack(Color_Masked)
This code works when I try it for a single image and a single mask (without the folders and loops).
However, when I try to loop over the images and masks I have in the two folders, I get output images that are segmented by the wrong mask (not their corresponding mask).
I can't segment each single image alone without looping because I have more than 500 Images and their masks.
I don't know what I'm missing or placing wrong in this code and how can I fix it? Also, is there an easier way to get the segmented images?

Unless I have grossly misunderstood, you just need something like this:
import glob
filelist = glob.glob('C:/Users/mab/Desktop/images/*.png')
for i in filelist:
mask = i.replace("images","masks")
print(i,mask)
On my iMac, that sort of thing produces:
/Users/mark/StackOverflow/images/b.png /Users/mark/StackOverflow/masks/b.png
/Users/mark/StackOverflow/images/a.png /Users/mark/StackOverflow/masks/a.png

How to view real time mosaicing of large image?

I have built a code which will stitch 100X100 images approx. I want to view this stitiching process in real time. I am using pyvips to create large image. I am saving final image in .DZI format as it will take very less memory footprint to display.
Below code is copied just for testing purpose https://github.com/jcupitt/pyvips/issues/43.
#!/usr/bin/env python
import sys
import pyvips
# overlap joins by this many pixels
H_OVERLAP = 100
V_OVERLAP = 100
# number of images in mosaic
ACROSS = 40
DOWN = 40
if len(sys.argv) < 2 + ACROSS * DOWN:
print 'usage: %s output-image input1 input2 ..'
sys.exit(1)
def join_left_right(filenames):
images = [pyvips.Image.new_from_file(filename) for filename in filenames]
row = images[0]
for image in images[1:]:
row = row.merge(image, 'horizontal', H_OVERLAP - row.width, 0)
return row
def join_top_bottom(rows):
image = rows[0]
for row in rows[1:]:
image = image.merge(row, 'vertical', 0, V_OVERLAP - image.height)
return image
rows = []
for y in range(0, DOWN):
start = 2 + y * ACROSS
end = start + ACROSS
rows.append(join_left_right(sys.argv[start:end]))
image = join_top_bottom(rows)
image.write_to_file(sys.argv[1])
To run this code:
$ export VIPS_DISC_THRESHOLD=100
$ export VIPS_PROGRESS=1
$ export VIPS_CONCURRENCY=1
$ mkdir sample
$ for i in {1..1600}; do cp ~/pics/k2.jpg sample/$i.jpg; done
$ time ./mergeup.py x.dz sample/*.jpg
here cp ~/pics/k2.jpg will copy k2.jpg image 1600 times from pics folder, so change according to your image name and location.
I want to display this process in real time. Right now after creating final mosaiced image I am able to display. Just an idea,I am thinking to make a large image and display it, then insert smaller images. I don't know, how it can be done. I am confused as we also have to make pyramidal structure. So If we create large image first we have to replace each level images with the new images. Creating .DZI image is expensive, so I don't want to create it in every running loop. Replacing images may be a solution. Any suggestion folks??

I suppose you have two challenges: how to keep the pyramid up-to-date on the server, and how to keep it up-to-date on the client. The brute force method would be to constantly rebuild the DZI on the server, and periodically flush the tiles on the client (so they reload). For something like that you'll also need to add a cache bust to the tile URLs each time, or the browser will think it should just use its local copy (not realizing it has updated). Of course this brute force method is probably too slow (though it might be interesting to try!).
For a little more finesse, you'd want to make a pyramid that's exactly aligned with the sub images. That way when you change a single sub image, it's obvious which tiles need to be updated. You can do this with DZI if you have square sub images and you use a tile size that is some even fraction of the sub image size. Also no tile overlap. Of course you'll have to build your own DZI constructor, since the existing ones aren't primed to simply replace individual tiles. If you know which tiles you changed on the server, you can communicate that to the client (either via periodic polling or with something like web sockets) and then flush only those tiles (again with the cache busting).
Another solution you could experiment with would be to not attempt a pyramid, per se, but just a flat set of tiles at a reasonable resolution to allow the user to pan around the scene. This would greatly simplify your pyramid updating on the server, since all you would need to do would be replace a single image for each sub image. This could be loaded and shown in a custom (non-OpenSeadragon) fashion on the client, or you could even use OpenSeadragon's multi-image feature to take advantage of its panning and zooming, like here: http://www.letsfathom.com/ (each album cover is its own independent image object).

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.