Using more memory than available - python

I have written a program that expands a database of prime numbers. This program is written in python and runs on windows 10 (x64) with 8GB RAM.
The program stores all primes it has found in a list of integers for further calculations and uses approximately 6-7GB of RAM while running. During some runs however, this figure has dropped to below 100MB. The memory usage then stays low for the duration of the run, though increasing as expected as more numbers are added to the prime array. Note that not all runs result in a memory drop.
Memory usage measured with task manager
These, seemingly random, drops has led me the following theories:
There's a bug in my code, making it drop critical data and messing up the results (most likely but not supported by the results)
Python just happens to optimize my code extremely well after a while.
Python or Windows is compensating for my over-usage of the RAM by cleaning out portions of my prime-number array that aren't used that much. (eventually resulting in incorrect calculations)
Python or Windows is compensating for my over-usage of the RAM by allocating disk space instead of ram.
Questions
What could be the reason(s) for this memory drop?
How does python handle programs that use more than available RAM?
How does Windows handle programs that use more than available RAM?

1, 2, and 3 are incorrect theories.
4 is correct. Windows (not Python) is moving some of your process memory to swap space. This is almost totally transparent to your application - you don't need to do anything special to respond to or handle this situation. The only thing you will notice is your application may get slower as information is written to and read from disk. But it all happens transparently. See https://en.wikipedia.org/wiki/Virtual_memory for more information.

Have you heard of paging? Windows dumps some ram (that hasn't been used in a while) to your hard drive to keep your computer from running out or ram and ultimately crashing.
Only Windows deals with memory management. Although, if you use Windows 10, it will also compress your memory, somewhat like a zip file.

Related

Python: MemoryError (scripts runs sometimes)

I have a script which sometimes runs successfully, providing the desired output, but when rerun moments later it provides the following error:
numpy.core._exceptions.MemoryError: Unable to allocate 70.8 MiB for an array with shape (4643100, 2) and data type float64
I realise this question has been answered several times (like here), but so far none of the solutions have worked for me. I was wondering if anyone has any idea how it's possible that sometimes the script runs fine and then moments later it provides an error?
I have lowered my computer's RAM usage, have increased the virtual memory, rebooted my laptop, none of which seemed to help (Windows 10, RAM 8.0GB, python 3.9.2 32 bit).
PS: Unfortunately not possible to share the script/create dummy.
Python is a garbage collected language. Garbage collection is non-deterministic. This means that peak memory usage may be different each time a program is run. So the first time you run the program, its peak memory usage is less than the available memory. But the next time you run the program, its peak memory usage is sufficient to consume all available memory. This assumes that the available memory on the host system is constant, which is an incorrect assumption. So the fluctuation in available memory, i.e. the memory not in use by the other running processes, is another reason that the program may raise a MemoryError one time, but terminate without error another time.
Sidenote: Increase virtual memory as a last resort. It isn't memory, it's disk that is used like memory, and it is much slower than memory.

Can Pickle handle files larger than the RAM installed on my machine?

I'm using pickle for saving on disk my NLP classifier built with the TextBlob library.
I'm using pickle after a lot of searches related to this question. At the moment I'm working locally and I have no problem loading the pickle file (which is 1.5Gb) with my i7 and 16gb RAM machine. But the idea is that my program, in the future, has to run on my server which only has 512Mb RAM installed.
Can pickle handle such a large file or will I face memory issues?
On my server I've got Python 3.5 installed and it is a Linux server (not sure which distribution).
I'm asking because at the moment I can't access my server, so I can't just try and find out what happens, but at the same time I'm doubtful if I can keep this approach or I have to find other solutions.
Unfortunately this is difficult to accurately answer without testing it on your machine.
Here are some initial thoughts:
There is no inherent size limit that the Pickle module enforces, but you're pushing the boundaries of its intended use. It's not designed for individual large objects. However, you since you're using Python 3.5, you will be able to take advantage of PEP 3154 which adds better support for large objects. You should specify pickle.HIGHEST_PROTOCOL when you dump your data.
You will likely have a large performance hit because you're trying to deal with an object that is 3x the size of your memory. Your system will probably start swapping, and possibly even thrashing. RAM is so cheap these days, bumping it up to at least 2GB should help significantly.
To handle the swapping, make sure you have enough swap space available (a large swap partition if you're on Linux, or enough space for the swap file on your primary partition on Windows).
As pal sch's comment shows, Pickle is not very friendly to RAM consumption during the pickling process, so you may have to deal with Python trying to get even more memory from the OS than the 1.5GB we may expect for your object.
Given these considerations, I don't expect it to work out very well for you. I'd strongly suggest upgrading the RAM on your target machine to make this work.
I don't see how you could load an object into RAM that exceeds the RAM. i.e. bytes(num_bytes_greater_than_ram) will always raise an MemoryError.

Python Process Terminated due to "Low Swap" When Writing To stdout for Data Science

I'm new to python so I apologize for any misconceptions.
I have a python file that needs to read/write to stdin/stdout many many times (hundreds of thousands) for a large data science project. I know this is not ideal, but I don't have a choice in this case.
After about an hour of running (close to halfway completed), the process gets terminated on my mac due to "Low Swap" which I believe refers to lack of memory. Apart from the read/write, I'm hardly doing any computing and am really just trying to get this to run successfully before going any farther.
My Question: Does writing to stdin/stdout a few hundred thousand times use up that much memory? The file basically needs to loop through some large lists (15k ints) and do it a few thousand times. I've got 500 gigs of hard drive space and 12 gigs of ram and am still getting the errors. I even spun up an EC2 instance on AWS and STILL had memory errors. Is it possible that I have some sort of memory leak in the script even though I'm not doing hardly anything? Is there anyway that I reduce the memory usage to run this successfully?
Appreciate any help.
the process gets terminated on my mac due to "Low Swap" which I believe refers to lack of memory
SWAP space is part of your Main Memory - RAM.
When a user reads a file it puts in it Main Memory (caches, and RAM). When its done it removes it.
However, when a user writes to a file, changes need to be recorded. One problem. What if you are writing to a different file every millisecond. The RAM and L caches reach capacity, so the least recently used (LRU) files are put into SWAP space. And since SWAP is still part of Main Memory (not the hard drive), it is possible to overflow it and lose information, which can cause a crash.
Is it possible that I have some sort of memory leak in the script even though I'm not doing hardly anything?
Possibly
Is there anyway that I reduce the memory usage to run this successfully?
One way is to think of how you are managing the file(s). Reads will not hurt SWAP because the file can just be scrapped, without the need to save. You might want to explicitly save the file (closing and opening the file should work) after a certain amount of information has been processed or a certain amount of time has gone by. Thus, removing the file from SWAP space.

Run a python source code on RAM

I have written a code which does some processing , I want to reduce the execution time of the program and I think it can be done if I run it on my RAM which is 1GB.
So will running my program form RAM make any difference to my execution time and if yes how it can be done.
Believe it or not, when you use a modernish computer system, most of your computation is done from RAM. (Well, technically, it's "done" from processor registers, but those are filled from RAM so let's brush that aside for the purposes of this answer)
This is thanks to the magic we call caches and buffers. A disk "cache" in RAM is filled by the operating system whenever something is read from permanent storage. Any further reads of that same data (until and unless it is "evicted" from the cache) only read memory instead of the permanent storage medium.
A "buffer" works similarly for write output, with data first being written to RAM and then eventually flushed out to the underlying medium.
So, in the course of normal operation, any runs of your program after the first (unless you've done a lot of work in between), will already be from RAM. Ditto the program's input file: if it's been read recently, it's already cached in memory! So you're unlikely to be able to speed things up by putting it in memory yourself.
Now, if you want to force things for some reason, you can create a "ramdisk", which is a filesystem backed by RAM. In Linux the easy way to do this is to mount "tmpfs" or put files in the /dev/shm directory. Files on a tmpfs filesystem go away when the computer loses power and are entirely stored in RAM, but otherwise behave like normal disk-backed files. From the way your question is phrased, I don't think this is what you want. I think your real answer is "whatever performance problems you think you have, this is not the cause, sorry".

Python process consuming increasing amounts of system memory, but heapy shows roughly constant usage

I'm trying to identify a memory leak in a Python program I'm working on. I'm current'y running Python 2.7.4 on Mac OS 64bit. I installed heapy to hunt down the problem.
The program involves creating, storing, and reading large database using the shelve module. I am not using the writeback option, which I know can create memory problems.
Heapy usage shows during the program execution, the memory is roughly constant. Yet, my activity monitor shows rapidly increasing memory. Within 15 minutes, the process has consumed all my system memory (16gb), and I start seeing page outs. Any idea why heapy isn't tracking this properly?
Take a look at this fine article. You are, most likely, not seeing memory leaks but memory fragmentation. The best workaround I have found is to identify what the output of your large working set operation actually is, load the large dataset in a new process, calculate the output, and then return that output to the original process.
This answer has some great insight and an example, as well. I don't see anything in your question that seems like it would preclude the use of PyPy.

Categories

Resources