Python Pillow Image to PDF and then merging memory issues - python

Goal:
Convert finite number of files in .jpg format and merge them into one PDF file.
Expected result:
Files from folder are successfully converted and merged into one pdf file at specified location.
Problem:
When size of files exceed certain number, in my tests it was around 400 mb the program crashes with following message:
Traceback (most recent call last):
File "C:\Users\kaczk\AppData\Local\Programs\Python\Python38-32\lib\site-packages\PIL\ImageFile.py", line 498, in _save
fh = fp.fileno()
io.UnsupportedOperation: fileno
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "MakePDF.py", line 10, in <module>
im1.save(pdf1_filename, "PDF" ,resolution=1000.0, save_all=True, append_images=imageList)
File "C:\Users\kaczk\AppData\Local\Programs\Python\Python38-32\lib\site-packages\PIL\Image.py", line 2084, in save
save_handler(self, fp, filename)
File "C:\Users\kaczk\AppData\Local\Programs\Python\Python38-32\lib\site-packages\PIL\PdfImagePlugin.py", line 46, in _save_all
_save(im, fp, filename, save_all=True)
File "C:\Users\kaczk\AppData\Local\Programs\Python\Python38-32\lib\site-packages\PIL\PdfImagePlugin.py", line 175, in _save
Image.SAVE["JPEG"](im, op, filename)
File "C:\Users\kaczk\AppData\Local\Programs\Python\Python38-32\lib\site-packages\PIL\JpegImagePlugin.py", line 770, in _save
ImageFile._save(im, fp, [("jpeg", (0, 0) + im.size, 0, rawmode)], bufsize)
File "C:\Users\kaczk\AppData\Local\Programs\Python\Python38-32\lib\site-packages\PIL\ImageFile.py", line 513, in _save
fp.write(d)
MemoryError
After running the program with task manager i noticed that indeed the computer runs out of ram memory when executing this program. Below is the code used.
import os
from PIL import Image
fileList = os.listdir(r'C:\location\of\photos\folder')
imageList = []
im1 = Image.open(os.path.join(r'C:\location\of\photos\folder',fileList[0]))
for file in fileList[1:]:
imageList.append(Image.open(os.path.join(r'C:\location\of\photos\folder',file)))
pdf1_filename = r'C:\location\of\pdf\destination.pdf'
im1.save(pdf1_filename, "PDF" ,resolution=500.0, save_all=True, append_images=imageList)
Is there an easy mistake I am making here regarding memory usage? Is there different module that would make the task easier while working with more and larger files? I will be very grateful for all help.

This question is quite old but since I got there struggling with the same issue, here is an answer.
You simply have to close your images after using them:
im1.close()
for i in imageList:
i.close()
This solved it for me.
PS: take a look at glob, it eases working with paths a lot.

Related

BadRarFile when extracting single file using RarFile in Python

I need to extract a single file (~10kB) from many very large RAR files (>1Gb). The code below shows a basic implementation of how I'm doing this.
from rarfile import RarFile
rar_file='D:\\File.rar'
file_of_interest='Folder 1/Subfolder 2/File.dat'
output_folder='D:/Output'
rardata = RarFile(rar_file)
rardata.extract(file_of_interest, output_folder)
rardata.close()
However, the extract instruction is returning the following error: rarfile.BadRarFile: Failed the read enough data: req=16384 got=52
When I open the file using WinRAR, I can extract the file successfully, so I'm sure the file isn't corrupted.
I've found some similar questions, but not a definite answer that worked for me.
Can someone help me to solve this error?
Additional info:
Windows 10 build 1909
Spyder 5.0.0
Python 3.8.1
Complete traceback of the error:
Traceback (most recent call last):
File "D:\Test\teste_rar_2.py", line 27, in <module>
rardata.extract(file_of_interest, output_folder)
File "C:\Users\bernard.kusel\AppData\Local\Continuum\anaconda3\lib\site-packages\rarfile.py", line 826, in extract
return self._extract_one(inf, path, pwd, True)
File "C:\Users\bernard.kusel\AppData\Local\Continuum\anaconda3\lib\site-packages\rarfile.py", line 912, in _extract_one
return self._make_file(info, dstfn, pwd, set_attrs)
File "C:\Users\bernard.kusel\AppData\Local\Continuum\anaconda3\lib\site-packages\rarfile.py", line 927, in _make_file
shutil.copyfileobj(src, dst)
File "C:\Users\bernard.kusel\AppData\Local\Continuum\anaconda3\lib\shutil.py", line 79, in copyfileobj
buf = fsrc.read(length)
File "C:\Users\bernard.kusel\AppData\Local\Continuum\anaconda3\lib\site-packages\rarfile.py", line 2197, in read
raise BadRarFile("Failed the read enough data: req=%d got=%d" % (orig, len(data)))
BadRarFile: Failed the read enough data: req=16384 got=52

IO Error python PIL image preprocessing script

I am following this tutorial and specifically going through the "generate own data" section:
https://github.com/surfertas/deep_learning/tree/master/projects/imdbwiki-challenge
https://github.com/surfertas/deep_learning/blob/master/projects/imdbwiki-challenge/imdb_preprocess.py
and i am facing this issue running the imdb_preprocess.py script;
Dictionary created...
Converting 1000 samples. (0=all samples)
Traceback (most recent call last):
File "imdb_preprocess.py", line 137, in <module>
main()
File "imdb_preprocess.py", line 131, in main
create_and_dump(imdb_dict, args.partial)
File "imdb_preprocess.py", line 106, in create_and_dump
for img_path in imgs
File "/usr/lib64/python2.7/site-packages/scipy/misc/pilutil.py", line 156, in imread
im = Image.open(name)
File "/usr/lib64/python2.7/site-packages/PIL/Image.py", line 2477, in open
fp = builtins.open(filename, "rb")
IOError: [Errno 2] No such file or directory: u'/path/48/10000548_1925-04-04_1964.jpg'
Now i manually checked folder 48 and checked that the image is complaining about is indeed there.
Any hints on where the fault is?
path was replaced

"OSError: cannot identify image file" opening image with PIL/Image

I am trying to get some code working that has broken but was working before. I have a PNG file on my desktop and I simply want to open it using the Image module from PIL.
from PIL import Image
img_dir = r'C:\Users\DylanDB\Desktop\square.png'
img = Image.open(img_dir)
This is a remake of my more advanced code that it happens in as well. The error is:
Traceback (most recent call last):
File "C:/Users/DylanDB/Desktop/img_test.py", line 5, in <module>
img = Image.open(img_dir)
File "C:\Python34\lib\site-packages\PIL\Image.py", line 2317, in open
% (filename if filename else fp))
OSError: cannot identify image file 'C:\\Users\\DylanDB\\Desktop\\square.png'
I had the same error and it was due to the file was recently created and not closed properly before opening with the Image.open(). After closing the file f.close() it werked as expect
I found that the file was a corrupted image.

Open process and save specific images in related folder

I'm looking for a way to open and crop several tiff images and then save the new croped images created in the same folder (related to my script folder).
My current code looks like this:
from PIL import Image
import os,platform
filespath = os.path.join(os.environ['USERPROFILE'],"Desktop\Python\originalImagesfolder")
for file in os.listdir(filespath):
if file.endswith(".tif"):
im = Image.open(file)
im.crop((3000, 6600, 3700, 6750)).save(file+"_crop.tif")
This script is returning me the error:
Traceback (most recent call last):
File "C:\Users...\Desktop\Python\script.py", line 22, in
im = Image.open(file)
File "C:\Python34\lib\site-packages\PIL\Image.py", line 2219, in open
fp = builtins.open(fp, "rb")
FileNotFoundError: [Errno 2] No such file or directory: 'Image1Name.tif'
'Image1Name.tif' is the first tif image I'm trying to process in the folder. I don't get how the script can give the file's name without being able to find it. Any Help?
PS: I have 2 days experience in python and codes generaly speaking. Sorry if the answer is obvious
[EDIT/Update]
After modifying my initial code thanks to vttran and ChrisGuest answers, turning then into this:
from PIL import Image
import os,platform
filespath = os.path.join(os.environ['USERPROFILE'],"Desktop\Python\originalImagesfolder")
for file in os.listdir(filespath):
if file.endswith(".tif"):
filepath = os.path.join(filespath, file)
im = Image.open(filepath)
im.crop((3000, 6600, 3700, 6750)).save("crop"+file)
the script is returning me a new error message:
Traceback (most recent call last):
File "C:/Users/.../Desktop/Python/script.py", line 11, in
im.crop((3000, 6600, 3700, 6750)).save("crop"+file)
File "C:\Python34\lib\site-packages\PIL\Image.py", line 986, in crop
self.load()
File "C:\Python34\lib\site-packages\PIL\ImageFile.py", line 166, in load
self.load_prepare()
File "C:\Python34\lib\site-packages\PIL\ImageFile.py", line 250, in
load_prepare
self.im = Image.core.new(self.mode, self.size) ValueError: unrecognized mode
A maybe-useful information, it's a Landsat8 image in GeoTiff format. The TIFF file therefore include geoposition, projection... informations. The script works perfectly fine if I first open and re-save them with a software like Photoshop (16int tiff format).
When you are search for the file names you use filespath to specify the directory.
But then when you open the file, you are only using the base filename.
So you could replace
im = Image.open(file)
with
filepath = os.path.join(filespath, file)
im = Image.open(filepath)
Also consider using the glob module, as you can do glob.glob(r'path\*.tif) .
It is also good practice to avoid using builtin functions like file as variable names.

Pillow's Image.open() MemoryError while opening big tiff file

I am writing a python script that will be running on the server and will resize images (namely Indesign links). It all works fine, except when I am trying to open bigger tiff files (800 MB).
I am using Pillow's Image.open() in order to resize it, but I am getting MemoryError error. I tried with other libraries like ImageMagick and tifffile with the same results. I could split it in smaller chunks, resize those and then combine them, but I have no idea how to do that without opening file first.
I searched extensively, but cannot find solution that would seem appropriate to my case. I was also observing memory consumption and it doesn't seem to be irregularly high. It's safe to say I am completely lost.
The whole trace goes like this:
Traceback (most recent call last):
File "app.py", line 87, in <module>
convert()
File "X:\Development\Python\zipper\converter.py", line 112, in convert
if saveAsJPEG(file, name + ".jpg"): converted_images = updateCounter(all_images, converted_images)
File "X:\Development\Python\zipper\converter.py", line 37, in saveAsJPEG
print Image.open(a).size
File "C:\Python27\lib\site-packages\PIL\Image.py", line 2266, in open
im = factory(fp, filename)
File "C:\Python27\lib\site-packages\PIL\ImageFile.py", line 97, in __init__
self._open()
File "C:\Python27\lib\site-packages\PIL\TiffImagePlugin.py", line 637, in _open
self._seek(0)
File "C:\Python27\lib\site-packages\PIL\TiffImagePlugin.py", line 672, in _seek
self.tag.load(self.fp)
File "C:\Python27\lib\site-packages\PIL\TiffImagePlugin.py", line 458, in load
data = ImageFile._safe_read(fp, size)
File "C:\Python27\lib\site-packages\PIL\ImageFile.py", line 521, in _safe_read
block = fp.read(min(size, SAFEBLOCK))
MemoryError
Thank you all for any help!
--- EDIT
This is the code that throws error
def saveAsJPEG(file, newfile):
print "\nopening:", file
try:
with Image.open(file) as im:
if not im.size[0] < size[0] or im.size[1] < size[1]:
new_im = im.resize(size, Image.ANTIALIAS)
new_im.save(newfile, 'JPEG', quality=100, dpi=(72, 72))
# Force a memory dump. Otherwise memory will get cluttered up -> I am not sure if this is necessary as it doesn't seem to do anything.
del im
del new_im
collect()
return True
except:
print "image is too big -> try something else. But what?"
# this line below throws error (MemoryError). It was the same without try-except part before.
Image.open(file)
The function is used in for-loop where "images" is array (list) of full file paths (X:\path\to\my\image.tiff)
for file in images:
#gets extension
name = basename(splitext(file)[0])
ext = splitext(file)[1]
# check for extension type and run appropriate function
if ext == '.jpg' or ext == '.jpeg' or ext == '.tif' or ext == '.tiff':
if saveAsJPEG(file, name + ".jpg"): converted_images = updateCounter(all_images, converted_images)
del file, name, ext

Categories

Resources