I have a script which sometimes runs successfully, providing the desired output, but when rerun moments later it provides the following error:
numpy.core._exceptions.MemoryError: Unable to allocate 70.8 MiB for an array with shape (4643100, 2) and data type float64
I realise this question has been answered several times (like here), but so far none of the solutions have worked for me. I was wondering if anyone has any idea how it's possible that sometimes the script runs fine and then moments later it provides an error?
I have lowered my computer's RAM usage, have increased the virtual memory, rebooted my laptop, none of which seemed to help (Windows 10, RAM 8.0GB, python 3.9.2 32 bit).
PS: Unfortunately not possible to share the script/create dummy.
Python is a garbage collected language. Garbage collection is non-deterministic. This means that peak memory usage may be different each time a program is run. So the first time you run the program, its peak memory usage is less than the available memory. But the next time you run the program, its peak memory usage is sufficient to consume all available memory. This assumes that the available memory on the host system is constant, which is an incorrect assumption. So the fluctuation in available memory, i.e. the memory not in use by the other running processes, is another reason that the program may raise a MemoryError one time, but terminate without error another time.
Sidenote: Increase virtual memory as a last resort. It isn't memory, it's disk that is used like memory, and it is much slower than memory.
Related
There are a lot of question on here about profiling Python memory usage in specific functions, or monitoring overall process RAM usage, or getting RAM usage breakdowns at specific manually instrumented places in a program. But none of this helps me at all. What I need to do is find which part of my code is causing a large RAM allocation.
For context, I am doing some work with TensorFlow 2, and at a certain point I get this warning:
Allocation of 10000000000 exceeds 10% of system memory.
Ok, great, I should look into that, I probably accidentally triggered some enormous broadcast or something. But where the heck did that happen? I have no idea. I thought there would be a simple way to profile my code and find out where the largest RAM allocation occurred, and what the call stack at that point was, but so far I have not found any way to do it amongst the zillions of Python memory tools that I looked through. Any ideas? Did I miss something obvious?
I have written a program that expands a database of prime numbers. This program is written in python and runs on windows 10 (x64) with 8GB RAM.
The program stores all primes it has found in a list of integers for further calculations and uses approximately 6-7GB of RAM while running. During some runs however, this figure has dropped to below 100MB. The memory usage then stays low for the duration of the run, though increasing as expected as more numbers are added to the prime array. Note that not all runs result in a memory drop.
Memory usage measured with task manager
These, seemingly random, drops has led me the following theories:
There's a bug in my code, making it drop critical data and messing up the results (most likely but not supported by the results)
Python just happens to optimize my code extremely well after a while.
Python or Windows is compensating for my over-usage of the RAM by cleaning out portions of my prime-number array that aren't used that much. (eventually resulting in incorrect calculations)
Python or Windows is compensating for my over-usage of the RAM by allocating disk space instead of ram.
Questions
What could be the reason(s) for this memory drop?
How does python handle programs that use more than available RAM?
How does Windows handle programs that use more than available RAM?
1, 2, and 3 are incorrect theories.
4 is correct. Windows (not Python) is moving some of your process memory to swap space. This is almost totally transparent to your application - you don't need to do anything special to respond to or handle this situation. The only thing you will notice is your application may get slower as information is written to and read from disk. But it all happens transparently. See https://en.wikipedia.org/wiki/Virtual_memory for more information.
Have you heard of paging? Windows dumps some ram (that hasn't been used in a while) to your hard drive to keep your computer from running out or ram and ultimately crashing.
Only Windows deals with memory management. Although, if you use Windows 10, it will also compress your memory, somewhat like a zip file.
I'm using pickle for saving on disk my NLP classifier built with the TextBlob library.
I'm using pickle after a lot of searches related to this question. At the moment I'm working locally and I have no problem loading the pickle file (which is 1.5Gb) with my i7 and 16gb RAM machine. But the idea is that my program, in the future, has to run on my server which only has 512Mb RAM installed.
Can pickle handle such a large file or will I face memory issues?
On my server I've got Python 3.5 installed and it is a Linux server (not sure which distribution).
I'm asking because at the moment I can't access my server, so I can't just try and find out what happens, but at the same time I'm doubtful if I can keep this approach or I have to find other solutions.
Unfortunately this is difficult to accurately answer without testing it on your machine.
Here are some initial thoughts:
There is no inherent size limit that the Pickle module enforces, but you're pushing the boundaries of its intended use. It's not designed for individual large objects. However, you since you're using Python 3.5, you will be able to take advantage of PEP 3154 which adds better support for large objects. You should specify pickle.HIGHEST_PROTOCOL when you dump your data.
You will likely have a large performance hit because you're trying to deal with an object that is 3x the size of your memory. Your system will probably start swapping, and possibly even thrashing. RAM is so cheap these days, bumping it up to at least 2GB should help significantly.
To handle the swapping, make sure you have enough swap space available (a large swap partition if you're on Linux, or enough space for the swap file on your primary partition on Windows).
As pal sch's comment shows, Pickle is not very friendly to RAM consumption during the pickling process, so you may have to deal with Python trying to get even more memory from the OS than the 1.5GB we may expect for your object.
Given these considerations, I don't expect it to work out very well for you. I'd strongly suggest upgrading the RAM on your target machine to make this work.
I don't see how you could load an object into RAM that exceeds the RAM. i.e. bytes(num_bytes_greater_than_ram) will always raise an MemoryError.
I'm new to python so I apologize for any misconceptions.
I have a python file that needs to read/write to stdin/stdout many many times (hundreds of thousands) for a large data science project. I know this is not ideal, but I don't have a choice in this case.
After about an hour of running (close to halfway completed), the process gets terminated on my mac due to "Low Swap" which I believe refers to lack of memory. Apart from the read/write, I'm hardly doing any computing and am really just trying to get this to run successfully before going any farther.
My Question: Does writing to stdin/stdout a few hundred thousand times use up that much memory? The file basically needs to loop through some large lists (15k ints) and do it a few thousand times. I've got 500 gigs of hard drive space and 12 gigs of ram and am still getting the errors. I even spun up an EC2 instance on AWS and STILL had memory errors. Is it possible that I have some sort of memory leak in the script even though I'm not doing hardly anything? Is there anyway that I reduce the memory usage to run this successfully?
Appreciate any help.
the process gets terminated on my mac due to "Low Swap" which I believe refers to lack of memory
SWAP space is part of your Main Memory - RAM.
When a user reads a file it puts in it Main Memory (caches, and RAM). When its done it removes it.
However, when a user writes to a file, changes need to be recorded. One problem. What if you are writing to a different file every millisecond. The RAM and L caches reach capacity, so the least recently used (LRU) files are put into SWAP space. And since SWAP is still part of Main Memory (not the hard drive), it is possible to overflow it and lose information, which can cause a crash.
Is it possible that I have some sort of memory leak in the script even though I'm not doing hardly anything?
Possibly
Is there anyway that I reduce the memory usage to run this successfully?
One way is to think of how you are managing the file(s). Reads will not hurt SWAP because the file can just be scrapped, without the need to save. You might want to explicitly save the file (closing and opening the file should work) after a certain amount of information has been processed or a certain amount of time has gone by. Thus, removing the file from SWAP space.
I'm trying to identify a memory leak in a Python program I'm working on. I'm current'y running Python 2.7.4 on Mac OS 64bit. I installed heapy to hunt down the problem.
The program involves creating, storing, and reading large database using the shelve module. I am not using the writeback option, which I know can create memory problems.
Heapy usage shows during the program execution, the memory is roughly constant. Yet, my activity monitor shows rapidly increasing memory. Within 15 minutes, the process has consumed all my system memory (16gb), and I start seeing page outs. Any idea why heapy isn't tracking this properly?
Take a look at this fine article. You are, most likely, not seeing memory leaks but memory fragmentation. The best workaround I have found is to identify what the output of your large working set operation actually is, load the large dataset in a new process, calculate the output, and then return that output to the original process.
This answer has some great insight and an example, as well. I don't see anything in your question that seems like it would preclude the use of PyPy.