Python PIL: open many files and load them into memory - python

I have a dataset containing 3000 images in train and 6000 images in test. It's 320x320 rgb png files. I thought that I can load this entire dataset into memory (since it's just 100mb), but then I try to do that I'm getting "[Errno 24] Too many open files: ..." error. Code of loading looks like that:
train_images = []
for index, row in dataset_p_train.iterrows():
path = data_path / row.img_path
train_images.append(Image.open(path))
I know that I'm opening 9000 files and not closing them which isn't a good practice, but unfortunately for my classificator I heavily rely on PIL img.getcolors() method, so I really want to store that dataset in memory as list of PIL images and not as a numpy array of 3000x320x320x3 uint8 to avoid casting them into PIL image each time I need colors of image.
So, what should I do? Somehow increase limit of opened files? Or there is a way to make PIL images reside entirely in memory without being "opened" from disk?

Image.open is lazy. It will not load the data until you try to do something with it.
You can call the image's load method to explicitly load the file contents. This will also close the file, unless the image has multiple frames (for example, an animated GIF).
See File Handling in Pillow for more details.

Related

Saving an Image as an OIB File in Python

I want to save an image/an array as an OIB File.
I have tried using the oiffile library. I am able to open and read OIB files, but I want to save an image as an OIB File.
Since oiffile uses cv2 structure for opening/closing images (via numpy arrays), so you might be opening the image using imread(). Then you can use imwrite() for saving/writing the image file to a destination path.

How to save jpeg data that is identical to the original jpeg file using OpenCV

I'm using OpenCV and Python. I have loaded a jpeg image into a numpy array. Now i want to save it back into jpeg format, but since the image was not modified, I don't want to compress it again. Is it possible to create a jpeg from the numpy array that is identical with the jpeg that it was loaded from?
I know this workflow (decode-encode without doing anything) sounds a bit stupid, but keeping the original jpeg data is not an option. I'm interested if it is possible to recreate the original jpeg just using the data at hand.
The question is different from Reading a .JPG Image and Saving it without file size change, as I don't modify anything in the picture. I really want to restore the original jpeg file based on the data at hand. I assume one could bypass the compression steps (the compression artifacts are already in the data) and just write the file in jpeg format. The question is, if this is possible with OpenCV.
Clarified answer, following comment below:
What you say makes no sense at all; You say that you have the raw, unmodified, RGB data. No you don't. You have the uncompressed data that has been reconstructed from the compressed jpeg file.
The JPEG standards specify how to un-compress an image / video. There is nothing in the standard about how to actually do this compression, so your original image data could have been compressed any one of a zillion different ways. You have no way of knowing the decoding steps that were required to recreate your data, so you cannot reverse them.
Image this.
"I have a number, 44, please tell me how I can get the original
numbers that this came from"
This is, essentially, what you are asking.
The only way you can do what you want (other than just copy the original file) is to read the image into an array before loading into openCV. Then if you want to save it, then just write the raw array to a file, something like this:
fi = 'C:\\Path\\to\\Image.jpg'
fo = 'C:\\Path\\to\\Copy_Image.jpg'
with open(fi,'rb') as myfile:
im_array = np.array(myfile.read())
# Do stuff here
image = cv2.imdecode(im_array)
# Do more stuff here
with open(fo,'wb') as myfile:
myfile.write(im_array)
Of course, it means you will have the data stored twice, effectively, in memory, but this seems to me to be your only option.
Sometimes, no matter how hard you want to do something, you have to accept that it just cannot be done.

ffmpeg - output images in memory instead of disk

I've a python script which basically converts a video into images and stores them in a folder, then all this images are read and informations are extracted from them, then images are deleted. Since the writing images step is so slow and is apparently useless for what I need, I would like to store images somehow in memory instead of the disk, read this images from there and doing my operations, this would speed up my process a lot.
Now my code look like:
1st step:
ffmpeg -i myvideo.avi -r 1 -f image2 C:\img_temp
2nd step:
for i in range(1, len(os.listdir(IMGTEMP)):
#My operations for each image
3rd step:
for image in os.listdir(IMGTEMP):
os.remove(IMGTEMP + "\\" + image)
With MoviePy:
import moviepy.editor as mpy
clip = mpy.VideoFileClip("video.avi")
for frame in clip.iter_frames():
# do something with the frame (a HxWx3 numpy array)
The short version is that you could use ramdisk or tmpfs or something like that, so that the files are indeed actually stored in memory. However, I'm wondering about your "operations for each image". Do you really need an image file for them? If all you're doing is read their size, why do you need an image (with compression/decompression etc.) overhead at all? Why not just use the FFmpeg API, read the AVI file, decode frames, and do your metrics on the decoded data directly?
Have a look at PyAV. (There is documentation, but it's rather sparse.)
It looks like you could just open a video, and then iterate over the frames.

Changing of pixel values after writing the same image using imwrite opencv python function

import cv2
import numpy as np
im=cv2.imread('test.jpg')
cv2.imwrite('result.jpg',im)
Here test.jpg have size 19 KB and result.jpg have 41 KB even though they are same images.
I observed that there is change in the pixel values of these two images.
How to prevent this one ?
Re-writing or 'saving' an image in any library or tool will always create a new file or 'new pixel values'. This happens because the image is being processed for saving at a certain quality. The saved image quality and size depends on the library doing the save. So default values, depth, compression, etc. need to be provided.
If you just want to create a copy of the image in a new file, either copy the file directly via sys or binary read the whole file and write it to a new one - without using any image processing libs.

How to read a large image in chunks in python?

I'm trying to compute the difference in pixel values of two images, but I'm running into memory problems because the images I have are quite large. Is there way in python that I can read an image lets say in 10x10 chunks at a time rather than try to read in the whole image? I was hoping to solve the memory problem by reading an image in small chunks, assigning those chunks to numpy arrays and then saving those numpy arrays using pytables for further processing. Any advice would be greatly appreciated.
Regards,
Berk
You can use numpy.memmap and let the operating system decide which parts of the image file to page in or out of RAM. If you use 64-bit Python the virtual memory space is astronomic compared to the available RAM.
If you have time to preprocess the images you can convert them to bitmap files (which will be large, not compressed) and then read particular sections of the file via offset as detailed here:
Load just part of an image in python
Conversion from any file type to bitmap can be done in Python with this code:
from PIL import Image
file_in = "inputCompressedImage.png"
img = Image.open(file_in)
file_out = "largeOutputFile.bmp"
img.save(file_out)

Categories

Resources