How to read a large image in chunks in python? - python

I'm trying to compute the difference in pixel values of two images, but I'm running into memory problems because the images I have are quite large. Is there way in python that I can read an image lets say in 10x10 chunks at a time rather than try to read in the whole image? I was hoping to solve the memory problem by reading an image in small chunks, assigning those chunks to numpy arrays and then saving those numpy arrays using pytables for further processing. Any advice would be greatly appreciated.
Regards,
Berk

You can use numpy.memmap and let the operating system decide which parts of the image file to page in or out of RAM. If you use 64-bit Python the virtual memory space is astronomic compared to the available RAM.

If you have time to preprocess the images you can convert them to bitmap files (which will be large, not compressed) and then read particular sections of the file via offset as detailed here:
Load just part of an image in python
Conversion from any file type to bitmap can be done in Python with this code:
from PIL import Image
file_in = "inputCompressedImage.png"
img = Image.open(file_in)
file_out = "largeOutputFile.bmp"
img.save(file_out)

Related

Python PIL: open many files and load them into memory

I have a dataset containing 3000 images in train and 6000 images in test. It's 320x320 rgb png files. I thought that I can load this entire dataset into memory (since it's just 100mb), but then I try to do that I'm getting "[Errno 24] Too many open files: ..." error. Code of loading looks like that:
train_images = []
for index, row in dataset_p_train.iterrows():
path = data_path / row.img_path
train_images.append(Image.open(path))
I know that I'm opening 9000 files and not closing them which isn't a good practice, but unfortunately for my classificator I heavily rely on PIL img.getcolors() method, so I really want to store that dataset in memory as list of PIL images and not as a numpy array of 3000x320x320x3 uint8 to avoid casting them into PIL image each time I need colors of image.
So, what should I do? Somehow increase limit of opened files? Or there is a way to make PIL images reside entirely in memory without being "opened" from disk?
Image.open is lazy. It will not load the data until you try to do something with it.
You can call the image's load method to explicitly load the file contents. This will also close the file, unless the image has multiple frames (for example, an animated GIF).
See File Handling in Pillow for more details.

How to efficiently convert large CZI images (+50GB) into JP2 using Python?

I have to convert large CZI microscopy images about +50GB in size into compress JP2 images for post analysis. The JP2 images need to be compress in order to save disk space and be post analyzed using software. My current setup only has 8 GB of ram available, so I need to be able to process this large images with my ram limited workstation.
I have managed to write scripts that convert smaller CZI images about 5 GB into JP2. I do this by reading a compress representation of the image into memory. However, when I try to do the same trick with the 50 GB images everything comes crashing down.
The follow is a representation of my workflow:
Read the CZI image into memory and store it in a numpy array
Save the numpy array into jp2 format using glymur. To be able to write an image in JP2 format with glymur the whole image needs to be loaded into memory. This is obviously a huge limitation when working with large images.
I would like to read a chunk of the CZI image and then write it into a JP2 image. This process should be repeated until the CZI image has been fully converted into its JP2 representation. If someone can show me how to write a JP2 image in chunks that would be enough to get the ball rolling, since I have seen documentation on reading chunks of CZI images into memory.
I appreciate any help or suggestions. Thank you in advance for your time.

Image to pixel array in python

Info: I have 30,000 jpg images that I need to convert into (NumPy) pixel arrays.
Problem: I have tried using Pillow to do the image conversions but it does about 2 images a second which would take hours to complete.
from PIL import Image
for img_num in range(30_000):
img = Image.open(img_dir+img_num+extension)
img_list.append(np.array(img))
Question: What is the best and fastest way to convert a large number of jpg images to pixel arrays using python.
I think what is taking the longest is the append() function.
Also, you are appending 30000 images to img_list, this means this single variable is extremely heavy in memory, do you actually need it? (if you image had 1000 pixels, you'd already be trying to allocate more than 30Mb)
In PIL and openCV the read oropen` functions directly make them jumpy arrays.

How to save jpeg data that is identical to the original jpeg file using OpenCV

I'm using OpenCV and Python. I have loaded a jpeg image into a numpy array. Now i want to save it back into jpeg format, but since the image was not modified, I don't want to compress it again. Is it possible to create a jpeg from the numpy array that is identical with the jpeg that it was loaded from?
I know this workflow (decode-encode without doing anything) sounds a bit stupid, but keeping the original jpeg data is not an option. I'm interested if it is possible to recreate the original jpeg just using the data at hand.
The question is different from Reading a .JPG Image and Saving it without file size change, as I don't modify anything in the picture. I really want to restore the original jpeg file based on the data at hand. I assume one could bypass the compression steps (the compression artifacts are already in the data) and just write the file in jpeg format. The question is, if this is possible with OpenCV.
Clarified answer, following comment below:
What you say makes no sense at all; You say that you have the raw, unmodified, RGB data. No you don't. You have the uncompressed data that has been reconstructed from the compressed jpeg file.
The JPEG standards specify how to un-compress an image / video. There is nothing in the standard about how to actually do this compression, so your original image data could have been compressed any one of a zillion different ways. You have no way of knowing the decoding steps that were required to recreate your data, so you cannot reverse them.
Image this.
"I have a number, 44, please tell me how I can get the original
numbers that this came from"
This is, essentially, what you are asking.
The only way you can do what you want (other than just copy the original file) is to read the image into an array before loading into openCV. Then if you want to save it, then just write the raw array to a file, something like this:
fi = 'C:\\Path\\to\\Image.jpg'
fo = 'C:\\Path\\to\\Copy_Image.jpg'
with open(fi,'rb') as myfile:
im_array = np.array(myfile.read())
# Do stuff here
image = cv2.imdecode(im_array)
# Do more stuff here
with open(fo,'wb') as myfile:
myfile.write(im_array)
Of course, it means you will have the data stored twice, effectively, in memory, but this seems to me to be your only option.
Sometimes, no matter how hard you want to do something, you have to accept that it just cannot be done.

Faster method of scaling down image array in Python using numpy and pyfits

I'm using Python 2.7.3 with numpy and pyfits to process scientific FITS files. I would like to work on the images at half or one quarter resolution for the sake of speed, and have this code:
# Read red image
hdulist = pyfits.open(red_fn)
img_data = hdulist[0].data
hdulist.close()
img_data_r = numpy.array(img_data, dtype=float)
# Scale it down to one quarter size
my=[]
for line in img_data_r[::4]:
myline=[]
for item in line[::4]:
myline.append(item)
my.append(myline)
img_data_r = my
This works, but I wonder if there is a faster, more native way to reduce the array. The reductions should happen as early as possible, the idea being that the data that will be processed is of minimal acceptable size. If there was a way of reading a reduced dataset with pyfits, that would be ideal. But such a method doesn't seem to exist (correct me if I'm wrong). How about numpy? Or scipy/math/anything else?
The data array you get from pyfits already is a NumPy array. You don't need to create one from it. Moerover, you can simply do the downsampling in a single step:
img_data_r = hdulist[0].data[::4, ::4]
This won't copy the data, but rather simply copy a new view with different strides. If you need the down-sampled image as a contiguous array, use numpy.ascontiguousarray().
This method of downsampling only keeps one in sixteen pixels, and completely drops the information in all the other pixels. If you need higher-quality downsampling, rather than doing it in your code, you are probably better off to downsample your FITS files using Imagemagick. This will also reduce the time it takes to read the files from disk.
To convert all your FITS files in the current directory in place (warning: big versions get overwritten), you could use
mogrify -resize 25% *.fits

Categories

Resources