How can I interpret this pyhthon memory profile? - python

Hello I am running a python script and I use memray to watch the memory usage, I got the following graph:
The question is that I am not sure of the difference between heap and resident size?
I read here https://pythonspeed.com/articles/measuring-memory-python/, that resident size, is the ram usage, but I don't find information about heap size? is the memory that really I need to execute the script? or is it only the space that python save to execute but didn't use?
How can I read this chart?
Thanks

Related

How to solve Python RAM leak when running large script

I have a massive Python script I inherited. It runs continuously on a long list of files, opens them, does some processing, creates plots, writes some variables to a new text file, then loops back over the same files (or waits for new files to be added to the list).
My memory usage steadily goes up to the point where my RAM is full within an hour or so. The code is designed to run 24/7/365 and apparently used to work just fine. I see the RAM usage steadily going up in task manager. When I interrupt the code, the RAM stays used until I restart the Python kernel.
I have used sys.getsizeof() to check all my variables and none are unusually large/increasing with time. This is odd - where is the RAM going then? The text files I am writing to? I have checked and as far as I can tell every file creation ends with a f.close() statement, closing the file. Similar for my plots that I create (I think).
What else would be steadily eating away at my RAM? Any tips or solutions?
What I'd like to do is some sort of "close all open files/figures" command at some point in my code. I am aware of the del command but then I'd have to list hundreds of variables at multiple points in my code to routinely delete them (plus, as I pointed out, I already checked getsizeof and none of the variables are large. Largest was 9433 bytes).
Thanks for your help!

Programmatically testing memory usage in Google Colab (Python 3.x)

I am trying to write a program that at a certain level of memory left will write a list to a file to free up memory in Google Colab. I can't find a way to programmatically test for the amount of memory left using python. To be clear I'm not looking for a way to save to a file (I already know that), I'm looking for a way to test the amount of memory left. The code I'm looking for would work something like this:
memory_left = memory_function/method()
if memory_left<=memory_threshold:
save_file()
Another solution would be using the memory profiler package to test the size of an object, and save the file when it gets to a certain size, but I don't think that solution will work because I'm going to have a dynamic environment of ever increasing memory usage. This means there might not be enough memory later on for a file of 100mB, when there are only 10mB left.

Difference between shared and unshared memory size

I am trying to find out how to see within a Python script (without any external lib) the RAM currently used by this script.
Some search here point me to the resource module: http://docs.python.org/2/library/resource.html#resource-usage
And here, I see there is 2 kind of memory, shared and unshared.
I was wondering what they were describing ? Hard drive versus RAM ? or something about multi-thread memory ? Or something else ?
Also, I do not think this is actually helping me to find out the current RAM usage, right ?
Thanks
RAM is allocated in chunks called pages. Some of these pages can be marked read-only, such as those in the text segment that contain the program's instructions. If a page is read-only, it is available to be shared between more than one process. This is the shared memory you see. Unshared memory is everything else that is specific to the currently running process, such as allocations from the heap.

Python process consuming increasing amounts of system memory, but heapy shows roughly constant usage

I'm trying to identify a memory leak in a Python program I'm working on. I'm current'y running Python 2.7.4 on Mac OS 64bit. I installed heapy to hunt down the problem.
The program involves creating, storing, and reading large database using the shelve module. I am not using the writeback option, which I know can create memory problems.
Heapy usage shows during the program execution, the memory is roughly constant. Yet, my activity monitor shows rapidly increasing memory. Within 15 minutes, the process has consumed all my system memory (16gb), and I start seeing page outs. Any idea why heapy isn't tracking this properly?
Take a look at this fine article. You are, most likely, not seeing memory leaks but memory fragmentation. The best workaround I have found is to identify what the output of your large working set operation actually is, load the large dataset in a new process, calculate the output, and then return that output to the original process.
This answer has some great insight and an example, as well. I don't see anything in your question that seems like it would preclude the use of PyPy.

Why does running SQLite (through python) cause memory to "unofficially" fill up?

I'm dealing with some big (tens of millions of records, around 10gb) database files using SQLite. I'm doint this python's standard interface.
When I try to insert millions of records into the database, or create indices on some of the columns, my computer slowly runs out of memory. If I look at the normal system monitor, it looks like the majority of the system memory is free. However, when I use top, it looks like I have almost no system memory free. If I sort the processes by their memory consuption, then non of them uses more than a couple percent of my memory (including the python process that is running sqlite).
Where is all the memory going? Why do top and Ubuntu's system monitor disagree about how much system memory I have? Why does top tell me that I have very little memory free, and at the same time not show which process(es) is (are) using all the memory?
I'm running Ubuntu 11.04, sqlite3, python 2.7.
10 to 1 says you are confused by linux's filesystem buffer/cache
see
ofstream leaking memory
https://superuser.com/questions/295900/linux-sort-all-data-in-memory/295902#295902
Test it by doing (as root)
echo 3 > /proc/sys/vm/drop_caches
The memory may be not assigned to a process, but it can be e.g. a file on tmpfs filesystem (/dev/shm, /tmp sometimes). You should show us the output of top or free (please note those tools do not show a single 'memory usage' value) to let us tell something more about the memory usage.
In case of inserting records to a database it may be a temporary image created for the current transaction, before it is committed to the real database. Splitting the insertion into many separate transactions (if applicable) may help.
I am just guessing, not enough data.
P.S. It seems I mis-read the original question (I assumed the computer slows down) and there is no problem. sehe's answer is probably better.

Categories

Resources