Python IMAP search, search results exhaust all memory - python

I'm trying to fetch all auto-reponse emails from a specific address in Python using imaplib. Everything worked fine for weeks but now each time I run my program all my RAM is consumed (several GB!) and the script end up being killed by the OOM killer.
Here is the code I'm currently using:
M = imaplib.IMAP4_SSL('server')
M.LOGIN('user', 'pass')
M.SELECT()
date = (datetime.date.today() - datetime.timedelta(1)).strftime("%d-%b-%Y")
result, data = M.uid('search', None, '(SENTON %s HEADER FROM "auto#site.com" NOT SUBJECT "RE:")' % date)
...
I'm sure that less than 100 emails of a few kilobytes should be returned. What could be the matter here ? Or is there a way to limit the number of emails returned ?
Thx!

There's no way to know for sure what the cause is, without being able to reproduce the problem (and certainly not without seeing the complete program which triggers the problem, and knowing the version of all dependencies you're using).
However, here's my best guess. Several versions of Python include a very memory-wasteful implementation of imaplib. The problem is particularly evident on Windows, but not limited to that platform.
The core of the problem is the way strings are allocated when read from a socket, and the way imaplib reads strings from sockets.
When reading from a socket, Python first allocates a buffer large enough to handle as many bytes as the application asks for. This may be something reasonable sounding, perhaps 16 kB. Then data is read into that buffer and the buffer is resized down to fit the number of bytes actually read.
The efficiency of this operation depends on the quality of the platform re-allocation implementation. Resizing a buffer may end up moving it to a more suitable location, where the smaller size avoids wasting much memory. Or it may just mark the tail part of the memory, no longer allocated as part of that region, as re-usable (and it may even be able to re-use it in practice). Or it might end up wasting that technically unallocated memory.
Imagine the cumulative effects of that memory being wasted if you have to read a few dozen kB of data, and the data arrives from the network a few dozen bytes at a time. Worse, imagine if the data is really trickling, and you only get a few bytes at a time. Or if you're reading a very "large" response of several hundred kB.
The amount of memory wasted - effectively allocated by the process, but not usable in any meaningful way - can be huge. 100 kB of data, read 5 bytes at a time requires 20480 buffers. If each buffer starts off as 16 kB and is unsuccessfully shrunk, causing them to remain at 16Kb, then you've allocated at least 320MB of memory to hold that 100 kB of data.
Some versions of imaplib exacerbated this problem by introducing multiple layers of buffering and copying. A very old version (hopefully not one you're actually using) even read 1 byte at a time (which would result in 1.6GB of memory usage in the above scenario).
Of course, this problem usually doesn't show up on Linux, where the re-allocator is not so bad. And at various points in previous Python releases (previous to the most recent 2.x release), the bug was "fixed", so I wouldn't expect to see it show up these days. And this doesn't explain why your program ran fine for a while before failing this way.
But it is my best guess.

Related

Python: MemoryError (scripts runs sometimes)

I have a script which sometimes runs successfully, providing the desired output, but when rerun moments later it provides the following error:
numpy.core._exceptions.MemoryError: Unable to allocate 70.8 MiB for an array with shape (4643100, 2) and data type float64
I realise this question has been answered several times (like here), but so far none of the solutions have worked for me. I was wondering if anyone has any idea how it's possible that sometimes the script runs fine and then moments later it provides an error?
I have lowered my computer's RAM usage, have increased the virtual memory, rebooted my laptop, none of which seemed to help (Windows 10, RAM 8.0GB, python 3.9.2 32 bit).
PS: Unfortunately not possible to share the script/create dummy.
Python is a garbage collected language. Garbage collection is non-deterministic. This means that peak memory usage may be different each time a program is run. So the first time you run the program, its peak memory usage is less than the available memory. But the next time you run the program, its peak memory usage is sufficient to consume all available memory. This assumes that the available memory on the host system is constant, which is an incorrect assumption. So the fluctuation in available memory, i.e. the memory not in use by the other running processes, is another reason that the program may raise a MemoryError one time, but terminate without error another time.
Sidenote: Increase virtual memory as a last resort. It isn't memory, it's disk that is used like memory, and it is much slower than memory.

How do you fix a memory leak within Django tests?

Recently I started having some problems with Django (3.1) tests, which I finally tracked down to some kind of memory leak.
I normally run my suite (roughly 4000 tests at the moment) with --parallel=4 which results in a high memory watermark of roughly 3GB (starting from 500MB or so).
For auditing purposes, though, I occasionally run it with --parallel=1 - when I do this, the memory usage keeps increasing, ending up over the VM's allocated 6GB.
I spent some time looking at the data and it became clear that the culprit is, somehow, Webtest - more specifically, its response.html and response.forms: each call during the test case might allocate a few MBs (two or three, generally) which don't get released at the end of the test method and, more importantly, not even at the end of the TestCase.
I've tried everything I could think of - gc.collect() with gc.DEBUG_LEAK shows me a whole lot of collectable items, but it frees no memory at all; using delattr() on various TestCase and TestResponse attributes and so on resulted in no change at all, etc.
I'm quite literally at my wits' end, so any pointer to solve this (beside editing the thousand or so tests which use WebTest responses, which is really not feasible) would be very much appreciated.
(please note that I also tried using guppy and tracemalloc and memory_profiler but neither gave me any kind of actionable information.)
Update
I found that one of our EC2 testing instances isn't affected by the problem, so I spent some more time trying to figure this out.
Initially, I tried to find the "sensible" potential causes - for instance, the cached template loader, which was enabled on my local VM and disabled on the EC2 instance - without success.
Then I went all in: I replicated the EC2 virtualenv (with pip freeze) and the settings (copying the dotenv), and checked out the same commit where the tests were running normally on the EC2.
Et voilĂ ! THE MEMORY LEAK IS STILL THERE!
Now, I'm officially giving up and will use --parallel=2 for future tests until some absolute guru can point me in the right directions.
Second update
And now the memory leak is there even with --parallel=2. I guess that's somehow better, since it looks increasingly like it's a system problem rather than an application problem. Doesn't solve it but at least I know it's not my fault.
Third update
Thanks to Tim Boddy's reply to this question I tried using chap to figure out what's making memory grow. Unfortunately I can't "read" the results properly but it looks like some non-python library is actually causing the problem.
So, this is what I've seen analyzing the core after a few minutes running the tests that I know cause the leak:
chap> summarize writable
49 ranges take 0x1e0aa000 bytes for use: unknown
1188 ranges take 0x12900000 bytes for use: python arena
1 ranges take 0x4d1c000 bytes for use: libc malloc main arena pages
7 ranges take 0x3021000 bytes for use: stack
139 ranges take 0x476000 bytes for use: used by module
1384 writable ranges use 0x38b5d000 (951,439,360) bytes.
chap> count used
3144197 allocations use 0x14191ac8 (337,189,576) bytes.
The interesting point is that the non-leaking EC2 instance shows pretty much the same values as the one I get from count used - which would suggest that those "unknown" ranges are the actual hogs.
This is also supported by the output of summarize used (showing first few lines):
Unrecognized allocations have 886033 instances taking 0x8b9ea38(146,401,848) bytes.
Unrecognized allocations of size 0x130 have 148679 instances taking 0x2b1ac50(45,198,416) bytes.
Unrecognized allocations of size 0x40 have 312166 instances taking 0x130d980(19,978,624) bytes.
Unrecognized allocations of size 0xb0 have 73886 instances taking 0xc66ca0(13,003,936) bytes.
Unrecognized allocations of size 0x8a8 have 3584 instances taking 0x793000(7,942,144) bytes.
Unrecognized allocations of size 0x30 have 149149 instances taking 0x6d3d70(7,159,152) bytes.
Unrecognized allocations of size 0x248 have 10137 instances taking 0x5a5508(5,920,008) bytes.
Unrecognized allocations of size 0x500018 have 1 instances taking 0x500018(5,242,904) bytes.
Unrecognized allocations of size 0x50 have 44213 instances taking 0x35f890(3,537,040) bytes.
Unrecognized allocations of size 0x458 have 2969 instances taking 0x326098(3,301,528) bytes.
Unrecognized allocations of size 0x205968 have 1 instances taking 0x205968(2,120,040) bytes.
The size of those single-instance allocations is very similar to the kind of deltas I see if I add calls to resource.getrusage(resource.RUSAGE_SELF).ru_maxrss in my test runner when starting/stopping tests - but they're not recognized as Python allocations, hence my feeling.
First of all, a huge apology: I was mistaken in thinking WebTest was the cause of this, and the reason was indeed in my own code, rather than libraries or anything else.
The real cause was a mixin class where I, unthinkingly, added a dict as class attribute, like
class MyMixin:
errors = dict()
Since this mixin is used in a few forms, and the tests generate a fair amout of form errors (that are added to the dict), this ended up hogging memory.
While this is not very interesting in itself, there are a few takeaways that may be helpful to future explorers who stumble across the same kind of problem. They might all be obvious to everybody except me and a single other developer - in which case, hello other developer.
The reason why the same commit had different behaviors on the EC2 machine and my own VM is that the branch in the remote machine hadn't been merged yet, so the commit that introduced the leak wasn't there poisoning the environment.
The takeaway here is: make sure the code you're testing is the same, not just the commit.
Low-level memory analysis might help in some cases but it's not a skill you pick up in half a day: I spent a long time trying to make sense of allocations and objects and whatever without getting any closer to the solution.
This kind of mistake can be incredibly costly - if I had a few hundred fewer tests, I wouldn't have ended up with an OOM error, and I probably wouldn't have noticed the problem at all. Until it was in production, that is.
That could be fixed with some kind of linter/static analysis too, if there were one which flags this kind of construction as potentially harmful. Unfortunately, there isn't one (that I could find).
git bisect is your friend, as long as you can find a commit that actually works.

Using more memory than available

I have written a program that expands a database of prime numbers. This program is written in python and runs on windows 10 (x64) with 8GB RAM.
The program stores all primes it has found in a list of integers for further calculations and uses approximately 6-7GB of RAM while running. During some runs however, this figure has dropped to below 100MB. The memory usage then stays low for the duration of the run, though increasing as expected as more numbers are added to the prime array. Note that not all runs result in a memory drop.
Memory usage measured with task manager
These, seemingly random, drops has led me the following theories:
There's a bug in my code, making it drop critical data and messing up the results (most likely but not supported by the results)
Python just happens to optimize my code extremely well after a while.
Python or Windows is compensating for my over-usage of the RAM by cleaning out portions of my prime-number array that aren't used that much. (eventually resulting in incorrect calculations)
Python or Windows is compensating for my over-usage of the RAM by allocating disk space instead of ram.
Questions
What could be the reason(s) for this memory drop?
How does python handle programs that use more than available RAM?
How does Windows handle programs that use more than available RAM?
1, 2, and 3 are incorrect theories.
4 is correct. Windows (not Python) is moving some of your process memory to swap space. This is almost totally transparent to your application - you don't need to do anything special to respond to or handle this situation. The only thing you will notice is your application may get slower as information is written to and read from disk. But it all happens transparently. See https://en.wikipedia.org/wiki/Virtual_memory for more information.
Have you heard of paging? Windows dumps some ram (that hasn't been used in a while) to your hard drive to keep your computer from running out or ram and ultimately crashing.
Only Windows deals with memory management. Although, if you use Windows 10, it will also compress your memory, somewhat like a zip file.

Python Process Terminated due to "Low Swap" When Writing To stdout for Data Science

I'm new to python so I apologize for any misconceptions.
I have a python file that needs to read/write to stdin/stdout many many times (hundreds of thousands) for a large data science project. I know this is not ideal, but I don't have a choice in this case.
After about an hour of running (close to halfway completed), the process gets terminated on my mac due to "Low Swap" which I believe refers to lack of memory. Apart from the read/write, I'm hardly doing any computing and am really just trying to get this to run successfully before going any farther.
My Question: Does writing to stdin/stdout a few hundred thousand times use up that much memory? The file basically needs to loop through some large lists (15k ints) and do it a few thousand times. I've got 500 gigs of hard drive space and 12 gigs of ram and am still getting the errors. I even spun up an EC2 instance on AWS and STILL had memory errors. Is it possible that I have some sort of memory leak in the script even though I'm not doing hardly anything? Is there anyway that I reduce the memory usage to run this successfully?
Appreciate any help.
the process gets terminated on my mac due to "Low Swap" which I believe refers to lack of memory
SWAP space is part of your Main Memory - RAM.
When a user reads a file it puts in it Main Memory (caches, and RAM). When its done it removes it.
However, when a user writes to a file, changes need to be recorded. One problem. What if you are writing to a different file every millisecond. The RAM and L caches reach capacity, so the least recently used (LRU) files are put into SWAP space. And since SWAP is still part of Main Memory (not the hard drive), it is possible to overflow it and lose information, which can cause a crash.
Is it possible that I have some sort of memory leak in the script even though I'm not doing hardly anything?
Possibly
Is there anyway that I reduce the memory usage to run this successfully?
One way is to think of how you are managing the file(s). Reads will not hurt SWAP because the file can just be scrapped, without the need to save. You might want to explicitly save the file (closing and opening the file should work) after a certain amount of information has been processed or a certain amount of time has gone by. Thus, removing the file from SWAP space.

Efficient writing to a Compact Flash in python

I'm writing a gui to do perform a glorified 'dd'.
I could just subprocess to 'dd' but I thought I might as well use python's open()/read()/write() if I can as it'll let me display progress much more easily.
Prompted by this link here I have:
input = open('filename.img', 'rb')
output = open("/dev/sdc", 'wb')
while True:
buffer = input.read(1024)
if buffer:
output.write(buffer)
else:
break
input.close()
output.close()
...however it is horribly slow. Or at least far slower than dd. (around 4-5x slower)
I had a play and noticed altering the number of bytes 'buffered' had a huge affect on the speed of completion. Raising it to 2048 for example seems to half the time taken. Perhaps going OT for SO here but I guess the flash has an optimum number of bytes to be written at once? Could anyone suggest how I discover this?
The image & card are 1Gb so I would very much like to return to the ~5 minutes dd took if possible. I appreciate that in all likelihood I won't match it.
Rather than trial and error, would anyone be able to suggest a way to optimise the above code and reasoning as to why it works? Especially what value for input.read() for example?
One restriction: python 2.4.3 on linux (centos5) (please don't hurt me)
Speed depending on buffer size is unrelated to the specific characteristics of compact flash, but inherent to all I/O with (relatively) slow devices, even to all kinds of system calls. You should make the buffer size as large as possible without exhausting memory - 2MiB should be enough for a Flash drive.
You should use the time and strace utilities to determine why your program is slower. If time shows a large user/real (large meaning greater than 0.1), you can optimize your Python interpreter - cpython 2.4 is pretty slow, and you're creating new objects all the time instead of writing into a preallocated buffer. If there is a significant difference in sys timings, analyze the syscalls made by both programs (with strace) and try to emit the ones dd does.
Also note that you must call fsync (or execute the sync program) afterwards to measure the real time it took writing the file to disk (or open the output file with O_DIRECT). Otherwise, the operating system will let your program exit and just keep all the written data in buffers that are then continually written out to the actual disk. To test that you're doing it right, remove the disk immediately after your program is finished. Note that the speed difference can be staggering. This effect is less noticeable if your disk(CF card) is way larger than the available physical memory.
So with a little help, I've removed the 'buffer' bit completely and added an os.fsync().
import os
input = open('filename.img', 'rb')
output = open("/dev/sdc", 'wb')
output.write(input.read())
input.close()
output.close()
outputfile.flush()
os.fsync(outputfile.fileno())

Categories

Resources