Heroku web dyno running Django app not releasing memory

Heroku web dyno running Django app not releasing memory - python

I'm struggling with a memory issue on Heroku when running a Django application (with gunicorn).
I have the following code that takes a user-uploaded image, removes all EXIF data, and returns the image ready for it to be uploaded to S3. This is used both as a form data cleaner and when reading base64 data into memory.
def sanitise_image(img): # img is InMemoryUploadedFile
try:
image = Image.open(img)
except IOError:
return None
# Move all pixel data into a new PIL image
data = list(image.getdata())
image_without_exif = Image.new(image.mode, image.size)
image_without_exif.putdata(data)
# Create new file with image_without_exif instead of input image.
thumb_io = StringIO.StringIO()
image_without_exif.save(thumb_io, format=image.format)
io_len = thumb_io.len
thumb_file = InMemoryUploadedFile(thumb_io, None, strip_tags(img.name), img.content_type,
io_len, None)
# DEL AND CLOSE EVERYTHING
thumb_file.seek(0)
img.close()
del img
thumb_io.close()
image_without_exif.close()
del image_without_exif
image.close()
del image
return thumb_file
I basically take an InMemoryUploadedFile and return a new one with just the pixel data.
del and closes may be redundant, but they represent my attempt to fix the situation where Heroku memory usage keeps growing and is not released every time this function terminates, even remaining overnight:
Running this on localhost with Guppy and following the tutorial, there are no remaining InMemoryUploadedFiles nor StringIOs nor PIL Image left in the heap, leaving me puzzled.
My suspicion is Python does not release the memory back to the OS, as I've read in multiple threads on SO. Has anyone played around with InMemoryUploadedFile and can give me an explanation as to why this memory is not being released?
When I do not perform this sanitisation, the issue does not occur.
Thanks a lot!

I think the issue is creating the temporary list object:
data = list(image.getdata())
Try:
image_without_exif.putdata(image.getdata())
This is why I think that is the issue:
>>> images = [Image.new('RGBA', (100, 100)) for _ in range(100)]
Python memory usage increased ~4Mb.
>>> get_datas = [image.getdata() for image in images]
No memory increase.
>>> pixel_lists = [list(image.getdata()) for image in images]
Python memory usage increased by ~85Mb.
You probably don't want to make getdata() into a list unless you need the numbers explicitly. From the Pillow docs:
Note that the sequence object returned by this method is an internal PIL data type, which only supports certain sequence operations. To convert it to an ordinary sequence (e.g. for printing), use list(im.getdata()).

I found my own answer eventually. Huge thanks to Ryan Tran for pointing me in the right direction. list() does indeed cause the leak.
Using the equivalent split() and merge() method (docs) this is the updated code:
with Image.open(img) as image:
comp = image.split()
image_without_exif = Image.merge(image.mode, comp)
thumb_io = StringIO.StringIO()
image_without_exif.save(thumb_io, format=image.format)
io_len = thumb_io.len
clean_img = InMemoryUploadedFile(thumb_io, None, strip_tags(img.name), img.content_type,
io_len, None)
clean_img.seek(0)
return clean_img

Related

Python Running out of Memory loading images

I have a notebook running inside a docker container on a machine with limited ram, with python 3.6. I am trying to load a large number of images using something like the following code.
class DataGenerator(keras.utils.Sequence):
def get_images(self):
img_list = []
loop through the contents of folder:
img = cv2.imread(img_path)
img = cv2.resize(img, (self.img_size, self.img_size))
img = img / 255.0
img = img - np.mean(img)
img_list.append(img)
img_list = np.asarray(img_list, dtype = np.float16)
return img_list
I am running out of ram very quickly and the kernel crashes. There is no other code, I narrowed it down to this part that takes up all the ram. Ideally shouldn't the variables be deleted once the function call is finished?
Initially the code looked like this
class DataGenerator(keras.utils.Sequence):
def get_images(self):
img_list = []
loop through the contents of folder:
img = cv2.imread(img_path)
img = cv2.resize(img, (self.img_size, self.img_size))
img - np.array(img) / 255.0
img = img - np.mean(img)
img_list.append(img)
return np.array(img_list)
But I saw using debugger the list created by this part of the code takes up too much memory and even after the function returns the numpy array, img_list variable is taking up space in the memory, so I assigned the numpy array to the name img_list, hoping that it would remove the reference to the list. Also I changed np.array to np.asarray to convert it into a numpy array inplace instead of creating a copy. I also changed the dtype to np.float16.
After assigning the name img_list to the numpy array, the original list does not show up in debugger anymore, but the amount memory consumed remains the same.
I checked the size of the returned numpy array and it is 1 GB, but this small function call takes up 6 GB of my memory and that memory is not being freed even after I manually call garbage collector using gc.collect().
I am running it on jupyter lab if that is relevant. I have looked everywhere and according to all the answers calling garbage collector manually is not necessary in most cases as pythons garbage collection is reliable. And if it has to be called, calling it should solve all the issues. But for me it doesn't do anything at all. It shows a number of garbage is deleted, but that doesn't free up the ram at all, or even if it does it's negligible.

Improving copying bytes from an Image

I have the following minimal code that gets the bytes from an image:
import Image
im = Image.open("kitten.png")
im_data = [pix for pixdata in im.getdata() for pix in pixdata]
This is rather slow (I have gigabytes of images to process) so how could this be sped up? I'm also unfamiliar with what exactly that code is trying to do. All my data is 1280 x 960 x 8-bit RGB, so I can ignore corner cases, etc.
(FYI, the full code is here - I've already replaced the ImageFile loop with the above Image.open().)

You can try
scipy.ndimage.imread()

If you mean speeding up by algorythamically i can suggest you accessing file with multiple threads simultaneously (only if you don't have a connection between processing sequence)
divide file logically by few sections and access each part simultaneously with threads (you have to put your operation inside a function and call it with threads)
here is a link to tutorial about threading in python
threding in python

I solved my problem, I think:
>>> [pix for pixdata in im.getdata() for pix in pixdata] ==
numpy.ndarray.tolist(numpy.ndarray.flatten(numpy.asarray(im)))
True
This cuts down the runtime by half, and with a bit of bash magic I can run the conversion on the 56 directories in parallel.

Changing image type in Django

I have an image of ImageFieldFile type in Django. If I do print type(image), I get <class 'django.db.models.fields.files.ImageFieldFile'>
Next, I opened this using PIL's Image.open(image), resized it via image.resize((20,20)) and closed it image.close().
After closing it, I notice image's type has changed to <class 'PIL.Image.Image'>.
How do I change it back to <class 'django.db.models.fields.files.ImageFieldFile'>? I thought .close() would suffice.

The way I got around this was to save it to a BytesIO object then stuff that into an InMemoryUploadedFile. So something like this:
from io import BytesIO
from PIL import Image
from django.core.files.uploadedfile import InMemoryUploadedFile
# Where image is your ImageFieldFile
pil_image = Image.open(image)
pil_image.resize((20, 20))
image_bytes = BytesIO()
pil_image.save(image_bytes)
new_image = InMemoryUploadedFile(
image_bytes, None, image.name, image.type, None, None, None
)
image_bytes.close()
Not terribly graceful, but it got the job done. This was done in Python 3. Not sure of Python 2 compatibility.
EDIT:
Actually, in hindsight, I like this answer better. Wish it existed when I was trying to solve this issue. :-\
Hope this helps. Cheers!

Difficulty with handling very large image using VIPS

I'm writing a Python(3.4.3) program that uses VIPS(8.1.1) on Ubuntu 14.04 LTS to read many small tiles using multiple threads and put them together into a large image.
In a very simple test :
from concurrent.futures import ThreadPoolExecutor
from multiprocessing import Lock
from gi.repository import Vips
canvas = Vips.Image.black(8000,1000,bands=3)
def do_work(x):
img = Vips.Image.new_from_file('part.tif') # RGB tiff image
with lock:
canvas = canvas.insert(img, x*1000, 0)
with ThreadPoolExecutor(max_workers=8) as executor:
for x in range(8):
executor.submit(do_work, x)
canvas.write_to_file('complete.tif')
I get correct result. In my full program, the work for each thread involves read binary from a source file, turn them into tiff format, read the image data and insert into canvas. It seems to work but when I try to examine the result, I ran into trouble. Because the image is extremely large(~50000*100000 pixels), I couldn't save the entire image in one file, so I tried
canvas = canvas.resize(.5)
canvas.write_to_file('test.jpg')
This takes extremely long time, and the resulting jpeg has only black pixels. If I do resize three times, the program get killed. I also tried
canvas.extract_area(20000,40000,2000,2000).write_to_file('test.tif')
This results in error message segmentation fault(core dumped) but it does save an image. There are image contents in it, but they seem to be in the wrong place.
I'm wondering what the problem could be?
Below are the codes for the complete program. The same logic was also implemented using OpenCV + sharedmem (sharedmem handled the multiprocessing part) and it worked without a problem.
import os
import subprocess
import pickle
from multiprocessing import Lock
from concurrent.futures import ThreadPoolExecutor
import threading
import numpy as np
from gi.repository import Vips
lock = Lock()
def read_image(x):
with open(file_name, 'rb') as fin:
fin.seek(sublist[x]['dataStartPos'])
temp_array = np.fromfile(fin, dtype='int8', count=sublist[x]['dataSize'])
name_base = os.path.join(rd_path, threading.current_thread().name + 'tempimg')
with open(name_base + '.jxr', 'wb') as fout:
temp_array.tofile(fout)
subprocess.call(['./JxrDecApp', '-i', name_base + '.jxr', '-o', name_base + '.tif'])
temp_img = Vips.Image.new_from_file(name_base + '.tif')
with lock:
global canvas
canvas = canvas.insert(temp_img, sublist[x]['XStart'], sublist[x]['YStart'])
def assemble_all(filename, ramdisk_path, scene):
global canvas, sublist, file_name, rd_path, tilesize_x, tilesize_y
file_name = filename
rd_path = ramdisk_path
file_info = fetch_pickle(filename) # A custom function
# this info includes where to begin reading image data, image size and coordinates
tilesize_x = file_info['sBlockList_P0'][0]['XSize']
tilesize_y = file_info['sBlockList_P0'][0]['YSize']
sublist = [item for item in file_info['sBlockList_P0'] if item['SStart'] == scene]
max_x = max([item['XStart'] for item in file_info['sBlockList_P0']])
max_y = max([item['YStart'] for item in file_info['sBlockList_P0']])
canvas = Vips.Image.black((max_x+tilesize_x), (max_y+tilesize_y), bands=3)
with ThreadPoolExecutor(max_workers=4) as executor:
for x in range(len(sublist)):
executor.submit(read_image, x)
return canvas
The above module (imported as mcv) is called in the driver script :
canvas = mcv.assemble_all(filename, ramdisk_path, 0)
To examine the content, I used
canvas.extract_area(25000, 40000, 2000, 2000).write_to_file('test_vips1.jpg')

I think your problem has to do with the way libvips calculates pixels.
In systems like OpenCV, images are huge areas of memory. You perform a series of operations, and each operation modifies a memory image in some way.
libvips is not like this, though the interface looks similar. In libvips, when you perform an operation on an image, you are actually just adding a new section to a pipeline. It's only when you finally connect the output to some sink (a file on disk, or a region of memory you want filled with image data, or an area of the display) that libvips will actually do any calculations. libvips will then use a recursive algorithm to run a large set of worker threads up and down the whole length of the pipeline, evaluating all of the operations you created at the same time.
To make an analogy with programming languages, systems like OpenCV are imperative, libvips is functional.
The good thing about the way libvips does things is that it can see the whole pipeline at once and it can optimise away most of the memory use and make good use of your CPU. The bad thing is that long sequences of operations can need large amounts of stack to evaluate (whereas with systems like OpenCV you are more likely to be bounded by image size). In particular, the recursive system used by libvips to evaluate means that pipeline length is limited by the C stack, about 2MB on many operating systems.
Here's a simple test program that does more or less what you are doing:
#!/usr/bin/python3
import sys
import pyvips
if len(sys.argv) < 4:
print "usage: %s image-in image-out n" % sys.argv[0]
print " make an n x n grid of image-in"
sys.exit(1)
tile = pyvips.Image.new_from_file(sys.argv[1])
outfile = sys.argv[2]
size = int(sys.argv[3])
img = pyvips.Image.black(size * tile.width, size * tile.height, bands=3)
for y in range(size):
for x in range(size):
img = img.insert(tile, x * size, y * size)
# we're not interested in huge files for this test, just write a small patch
img.crop(10, 10, 100, 100).write_to_file(outfile)
You run it like this:
time ./bigjoin.py ~/pics/k2.jpg out.tif 2
real 0m0.176s
user 0m0.144s
sys 0m0.031s
It loads k2.jpg (a 2k x 2k JPG image), repeats that image into a 2 x 2 grid, and saves a small part of it. This program will work well with very large images, try removing the crop and running as:
./bigjoin.py huge.tif out.tif[bigtiff] 10
and it'll copy the huge tiff image 100 times into a REALLY huge tiff file. It'll be quick and use little memory.
However, this program will become very unhappy with small images being copied many times. For example, on this machine (a Mac), I can run:
./bigjoin.py ~/pics/k2.jpg out.tif 26
But this fails:
./bigjoin.py ~/pics/k2.jpg out.tif 28
Bus error: 10
With a 28 x 28 output, that's 784 tiles. The way we've built the image, repeatedly inserting a single tile, that's a pipeline 784 operations long -- long enough to cause a stack overflow. On my Ubuntu laptop I can get pipelines up to about 2,900 operations long before it starts failing.
There's a simple way to fix this program: build a wide rather than a deep pipeline. Instead of inserting a single image each time, make a set of strips, then join the strips. Now the pipeline depth will be proportional to the square root of the number of tiles. For example:
img = pyvips.Image.black(size * tile.width, size * tile.height, bands=3)
for y in range(size):
strip = pyvips.Image.black(size * tile.width, tile.height, bands=3)
for x in range(size):
strip = strip.insert(tile, x * size, 0)
img = img.insert(strip, 0, y * size)
Now I can run:
./bigjoin2.py ~/pics/k2.jpg out.tif 200
Which is 40,000 images joined together.

Python server app leaking memory

I'm trying to diagnose why my Python server app is leaking memory. The app takes a request for an image url resizes it using Vips and returns the image. After every request the memory usage grows roughly by the size of the original image.
from fapws import base
import fapws._evwsgi as evwsgi
from gi.repository import Vips
import urllib2
import hmac
import hashlib
import base64
import StringIO
from boto.s3.connection import S3Connection
from boto.s3.bucket import Bucket
def start():
evwsgi.start('0.0.0.0', '80')
evwsgi.set_base_module(base)
def lfrThumbnail(environ, start_response):
try:
parameters = environ['PATH_INFO'].split('/')
s3File = 'my s3 url' + parameters[0]
width = float(parameters[1])
height = float(parameters[2])
hmacSignatureUser = parameters[3]
hmacSignature = some hasing code...
if not (hmacSignatureUser == hmacSignature):
print hmacSignatureUser
print hmacSignature
print hmacSignatureUser == hmacSignature
raise Exception
bufferedImage = urllib2.urlopen(s3File).read()
image = Vips.Image.new_from_buffer(bufferedImage, '')
imageWidth = float(image.width)
imageHeight = float(image.height)
imageAspectRatio = imageWidth / imageHeight
if (width > imageWidth) or (height > imageHeight):
image = image
elif abs((imageAspectRatio / (width/height)) - 1) < 0.05:
image = image.resize(width / imageWidth)
else:
scaleRatioWidth = width / imageWidth
scaleRatioHeight = height / imageHeight
maxScale = max(scaleRatioWidth, scaleRatioHeight)
image = image.resize(maxScale)
cropStartX = (image.width - width) / 2
cropStartY = (image.height - height) / 2
image = image.crop(cropStartX, cropStartY, width, height)
except Exception, e:
start_response('500 INTERNAL SERVER ERROR', [('Content-Type','text')])
return ['Error generating thumbnail']
start_response('200 OK', [
('Content-Type','image/jpeg'),
('Cache-Control: max-stale', '31536000')
])
return [image.write_to_buffer('.jpg[Q=90]')]
evwsgi.wsgi_cb(('/lfr/', lfrThumbnail))
evwsgi.set_debug(0)
evwsgi.run()
if __name__ == '__main__':
start()
I've tried using muppy , the pympler tracker but each diff after the image open/close operations showed only a couple of bytes being used.
Could the external C libraries be the cause of the memory leak? if so, how does one debug that.
If it's anything related I'm running the python server inside a docker container

I'm the libvips maintainer. It sounds like the vips operation cache: vips keeps the last few operations in memory and reuses the results if it can. This can be a huge performance win in some cases.
For a web service, you're probably caching elsewhere so you won't want this, or you won't want a large cache at least. You can control the cache size with vips_cache_set_max() and friends:
http://www.vips.ecs.soton.ac.uk/supported/current/doc/html/libvips/VipsOperation.html#vips-cache-set-max
From Python it's:
Vips.cache_set_max(0)
To turn off the cache completely. You can set the cache to limit by memory use, file descriptor use, or number of operations.
There are a couple of other useful things you can set to watch resource usage. Vips.leak_set(True) makes vips report leaked objects on exit, and also report peak pixel buffer memory use. Vips.cache_set_trace(True) makes it trace all operations as they are called, and shows cache hits.
In your code, I would also enable sequential mode. Add access = Vips.Access.SEQUENTIAL to your new_from_buffer().
The default behaviour is to open images for full random access (since vips doesn't know what operations you'll end up running on the image). For things like JPG, this means that vips will decode the image to a large uncompressed array on open. If the image is under 100mb, it'll keep this array in memory.
However for a simple resize, you only need to access pixels top-to-bottom, so you can hint sequential access on open. In this mode, vips will only decompress a few scanlines at once from your input and won't ever keep the whole uncompressed image around. You should see a nice drop in memory use and latency.
There are a lot of other things you could handle, like exif autorotate, colour management, transparency, jpeg shrink-on-load, and many others, I'm sure you know. The sources to vipsthumbnail might be a useful reference:
https://github.com/jcupitt/libvips/blob/master/tools/vipsthumbnail.c

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.