How to get peak memory usage of python script?

How to get peak memory usage of python script? - python

I'm doing some extensive scientific python calculations and whant to know execution time and memory footprint of python script.
So how to get peak memory usage of python script?
If it matters I'm on Windows and use python 2.7.

Sounds like you are looking for a memory profiler.
Memory_profiler is one that you can dive into which line is giving you the problems and with some querying you can figure out which area is the biggest in memory consumption.
https://pypi.python.org/pypi/memory_profiler
and since you are using windows it will also need this https://pypi.python.org/pypi/psutil
Good Luck!

The resource module can give you this. Works in both Python 2 and Python 3.
import resource
resource.getrusage(resource.RUSAGE_SELF).ru_maxrss
This is peak memory in kilobytes. The user and system time is also included in the value from getrusage.

For the peak memory, as you are on Windows, you can use psutil and psutil.Process.memory_info, for example to get the peak working set size, in bytes:
>>> import psutil
>>> p = psutil.Process()
>>> p.memory_info().peak_wset
238530560L
As per the link above, you can get more details about some Windows specific fields on this page.

Related

psutil gives other result than resource

I want to check memory consumption of my python code and have therefore added the following rows in the code:
import resource
print(resource.getrusage(resource.RUSAGE_SELF).ru_maxrss)
As an alternative I have also tried this:
import psutil
process = psutil.Process(os.getpid())
print(process.memory_info().rss) # in bytes
However, I get different results, as for example 866 480 from resource and 730 689 536 from psutil. Of course as you can see, in the first case it is kilobytes and in the second case bytes, but it is a difference also in addition to that.
Reading the documentation, I still don't understand causes the difference, so input would be valuable.

TLDR: resource.getrusage sometimes misses that Python already removed objects from memory
There was a bug in memory profiler (which was using resource.getrusage at that time). In this blog post the different methods for memory measurements are described. I cite:
"this approach [resource.getrusage] is several times faster than the one based in psutil [...] The problem with this approach is that it seems to report results that are slightly different in some cases. Notably it seems to differ when objects have been recently liberated from the python interpreter. In the following example, orphaned arrays are liberated by the python interpreter, which is correctly seen by psutil but not by resource..."

Using Python's pickle in Sage results in high memory usage

I am using the Python based Sage Mathematics software to create a very long list of vectors. The list contains roughly 100,000,000 elements and sys.getsizeof() tells me that it is of size a little less than 1GB.
This list I pickle into a file (which already takes a long time -- but fair enough). Only when I unpickle this list it gets annoying. The RAM usage increases from 1.15GB to 4.3GB, and I am wondering what's going on?
How can I find out in Sage what all the memory is used for? And do you have any ideas how to optimize this by maybe applying Python tricks?
This is a reply to the comment of kcrisman.
The exact code I cannot post since it would be too long. But here is a simple example where the phenomena can be observed. I am working on Linux 3.2.0-4-amd64 #1 SMP Debian 3.2.51-1 x86_64 GNU/Linux.
Start Sage and execute:
import pickle
L = [vector([1,2,3]) for k in range(1000000)]
f = open("mylist", 'w')
pickle.dump(L, f)
On my system the list is 8697472 bytes big, and the file I pickled into has roughly 130MB. Now close Sage and watch your memory (with htop, for example). Then execute the following lines:
import pickle
f = open("mylist", 'r')
pickle.load(f)
Without sage my Linux system uses 1035MB of memory, when Sage is running the usage increases to 1131MB. After I unpickled the file it uses 2535MB which I find odd.

It's probably better to not use python's pickle module directly. cPickle is already a bit better, but a lot of pickling in sage assumes protocol 2, which (c)Pickle doesn't default to. You can use sage's own wrappers of pickle. If I do your example with
sage: open("mylist",'w').write(dumps(L))
and then load it in a fresh session via
sage: L = loads(open("mylist",'r').read())
I observe no problems.
Note that the above interface is not the best one to pickle/unpickle in sage to a file. You'd be better off using save/load. I just did it that way to stay as close as possible to your example.

Python process consuming increasing amounts of system memory, but heapy shows roughly constant usage

I'm trying to identify a memory leak in a Python program I'm working on. I'm current'y running Python 2.7.4 on Mac OS 64bit. I installed heapy to hunt down the problem.
The program involves creating, storing, and reading large database using the shelve module. I am not using the writeback option, which I know can create memory problems.
Heapy usage shows during the program execution, the memory is roughly constant. Yet, my activity monitor shows rapidly increasing memory. Within 15 minutes, the process has consumed all my system memory (16gb), and I start seeing page outs. Any idea why heapy isn't tracking this properly?

Take a look at this fine article. You are, most likely, not seeing memory leaks but memory fragmentation. The best workaround I have found is to identify what the output of your large working set operation actually is, load the large dataset in a new process, calculate the output, and then return that output to the original process.
This answer has some great insight and an example, as well. I don't see anything in your question that seems like it would preclude the use of PyPy.

Huge memory usage of Python's json module?

When I load the file into json, pythons memory usage spikes to about 1.8GB and I can't seem to get that memory to be released. I put together a test case that's very simple:
with open("test_file.json", 'r') as f:
j = json.load(f)
I'm sorry that I can't provide a sample json file, my test file has a lot of sensitive information, but for context, I'm dealing with a file in the order of 240MB. After running the above 2 lines I have the previously mentioned 1.8GB of memory in use. If I then do del j memory usage doesn't drop at all. If I follow that with a gc.collect() it still doesn't drop. I even tried unloading the json module and running another gc.collect.
I'm trying to run some memory profiling but heapy has been churning 100% CPU for about an hour now and has yet to produce any output.
Does anyone have any ideas? I've also tried the above using cjson rather than the packaged json module. cjson used about 30% less memory but otherwise displayed exactly the same issues.
I'm running Python 2.7.2 on Ubuntu server 11.10.
I'm happy to load up any memory profiler and see if it does better then heapy and provide any diagnostics you might think are necessary. I'm hunting around for a large test json file that I can provide for anyone else to give it a go.

I think these two links address some interesting points about this not necessarily being a json issue, but rather just a "large object" issue and how memory works with python vs the operating system
See Why doesn't Python release the memory when I delete a large object? for why memory released from python is not necessarily reflected by the operating system:
If you create a large object and delete it again, Python has probably released the memory, but the memory allocators involved don’t necessarily return the memory to the operating system, so it may look as if the Python process uses a lot more virtual memory than it actually uses.
About running large object processes in a subprocess to let the OS deal with cleaning up:
The only really reliable way to ensure that a large but temporary use of memory DOES return all resources to the system when it's done, is to have that use happen in a subprocess, which does the memory-hungry work then terminates. Under such conditions, the operating system WILL do its job, and gladly recycle all the resources the subprocess may have gobbled up. Fortunately, the multiprocessing module makes this kind of operation (which used to be rather a pain) not too bad in modern versions of Python.

cProfile taking a lot of memory

I am attempting to profile my project in python, but I am running out of memory.
My project itself is fairly memory intensive, but even half-size runs are dieing with "MemoryError" when run under cProfile.
Doing smaller runs is not a good option, because we suspect that the run time is scaling super-linearly, and we are trying to discover which functions are dominating during large runs.
Why is cProfile taking so much memory? Can I make it take less? Is this normal?

Updated: Since cProfile is built into current versions of Python (the _lsprof extension) it should be using the main allocator. If this doesn't work for you, Python 2.7.1 has a --with-valgrind compiler option which causes it to switch to using malloc() at runtime. This is nice since it avoids having to use a suppressions file. You can build a version just for profiling, and then run your Python app under valgrind to look at all allocations made by the profiler as well as any C extensions which use custom allocation schemes.
(Rest of original answer follows):
Maybe try to see where the allocations are going. If you have a place in your code where you can periodically dump out the memory usage, you can use guppy to view the allocations:
import lxml.html
from guppy import hpy
hp = hpy()
trees = {}
for i in range(10):
# do something
trees[i] = lxml.html.fromstring("<html>")
print hp.heap()
# examine allocations for specific objects you suspect
print hp.iso(*trees.values())

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to get peak memory usage of python script? - python

I'm doing some extensive scientific python calculations and whant to know execution time and memory footprint of python script. So how to get peak memory usage of python script? If it matters I'm on Windows and use python 2.7.

The resource module can give you this. Works in both Python 2 and Python 3. import resource resource.getrusage(resource.RUSAGE_SELF).ru_maxrss This is peak memory in kilobytes. The user and system time is also included in the value from getrusage.

Related

psutil gives other result than resource

Using Python's pickle in Sage results in high memory usage

Python process consuming increasing amounts of system memory, but heapy shows roughly constant usage

Huge memory usage of Python's json module?

cProfile taking a lot of memory

Categories

Resources