Stop a program before it uses too much memory - python

I'm working on a Python program which sometimes fills up a list with millions of items. The computer (Ubuntu) starts swapping and the debugger (Eclipse) becomes unresponsive.
Is it possible to add a line in the cycle that checks how much memory is being used, and interrupts the execution, so I can check what's going on?
I'm thinking about something like:
if usedmemory() > 1000000000:
pass # with a breakpoint here
but I don't know what used memory() could be.

This is highly dependant on the machine you're running Python on. Here's a SO answer for a way to do it on Linux https://stackoverflow.com/a/278271/541208, but the other answer there offers a more platform independant solution: https://stackoverflow.com/a/2468983/541208: The psutil library, which you can install via pip install psutil:
>>> psutil.virtual_memory()
vmem(total=8374149120L, available=2081050624L, percent=75.1, used=8074080256L, free=300068864L, active=3294920704, inactive=1361616896, buffers=529895424L, cached=1251086336)
>>> psutil.swap_memory()
swap(total=2097147904L, used=296128512L, free=1801019392L, percent=14.1, sin=304193536, sout=677842944)
So you'd look at the percent of the available memory and kill your process depending on how much memory it has been using

Related

Python script gets "killed"

I am facing a problem with a python script getting killed. I had always used this script with no problem at all until two days ago, then it started to print, without any change in the code, the string 'killed' before aborting the execution.
Other people have tried to run the same code on their system and it works fine, as it used to do with me until two days ago.
I have read some old similar question, and I have got the problem could be an out-of-memory issue due to a bad memory management in my code. It sounds a little strange to me, since it used to work perfectly until some days ago and the problem appears on my system only.
Do you have any idea on how to inspect the problem and find a possible solution, please?
Python version: Python 2.7.14+
System: Scientific Linux CERN 7
In your case, it's highly probale that the script you're processing reached some given limit of the amount of resources it's able to use and that depends on your OS and other parameters, are you running something else with the script ? or are there many open files etc ?
The most likely reason for such an error is exceeding memory use, whiwh forces the system to not take risks and break when allocating more starts failing. Maybe you can print in parallel the total memory you're using to have a glimpse of what's happening since the information you've given are not enough to help you :
import os, psutil
process = psutil.Process(os.getpid())
then: (for python 3)
print(process.memory_info().rss)
or: (for python 2.7) (tested)
print(process.memory_info()[0])

python: how to check the use of an external program

In Python, how do you check that an external program is running? I'd like to track my use of some programs, so I can see the amount of time I've spent with them. For example, if I launch my program , I want to be able to see if Chrome has already been launched, and if so, start a timer which would end when I exit Chrome.
Ive seen that then subprocess module can launch external programs, but this is not what I'm looking for.
Thanks in advance.
You are looking for psutil
It is great to get information on the system (CPU / RAM / HD / ...)
And in your case, processes : https://pythonhosted.org/psutil/#processes
Obtaining information on running processes in general depends on the operating system you are using. The Python standard library does not contain a platform-independent way of obtaining this information. There are, however, third-party libraries for this purpose, e.g. psutil.
In my case I would try something using Task Manager data, probably using subprocess.check_output(ps)(for me that looks good), but you can the [psutil][1] library.
Tell us what you did later :)

Huge memory usage of Python's json module?

When I load the file into json, pythons memory usage spikes to about 1.8GB and I can't seem to get that memory to be released. I put together a test case that's very simple:
with open("test_file.json", 'r') as f:
j = json.load(f)
I'm sorry that I can't provide a sample json file, my test file has a lot of sensitive information, but for context, I'm dealing with a file in the order of 240MB. After running the above 2 lines I have the previously mentioned 1.8GB of memory in use. If I then do del j memory usage doesn't drop at all. If I follow that with a gc.collect() it still doesn't drop. I even tried unloading the json module and running another gc.collect.
I'm trying to run some memory profiling but heapy has been churning 100% CPU for about an hour now and has yet to produce any output.
Does anyone have any ideas? I've also tried the above using cjson rather than the packaged json module. cjson used about 30% less memory but otherwise displayed exactly the same issues.
I'm running Python 2.7.2 on Ubuntu server 11.10.
I'm happy to load up any memory profiler and see if it does better then heapy and provide any diagnostics you might think are necessary. I'm hunting around for a large test json file that I can provide for anyone else to give it a go.
I think these two links address some interesting points about this not necessarily being a json issue, but rather just a "large object" issue and how memory works with python vs the operating system
See Why doesn't Python release the memory when I delete a large object? for why memory released from python is not necessarily reflected by the operating system:
If you create a large object and delete it again, Python has probably released the memory, but the memory allocators involved don’t necessarily return the memory to the operating system, so it may look as if the Python process uses a lot more virtual memory than it actually uses.
About running large object processes in a subprocess to let the OS deal with cleaning up:
The only really reliable way to ensure that a large but temporary use of memory DOES return all resources to the system when it's done, is to have that use happen in a subprocess, which does the memory-hungry work then terminates. Under such conditions, the operating system WILL do its job, and gladly recycle all the resources the subprocess may have gobbled up. Fortunately, the multiprocessing module makes this kind of operation (which used to be rather a pain) not too bad in modern versions of Python.

Memory leak when running python in Mac OS Terminal

I just ran a python program in the Mac OS Terminal, and there is unusual memory leak.
The program is simple like this:
for i in xrange(1000000000, 2000000000, 10):
i2 = i * i
print i, i2, str(i2)[::2]
if str(i2)[::2] == '1234567890':
break
When the program is running, it consumes more and more memory till it use up all my memory.
When I terminate the program, my Terminal.app still consumes a lot of memory, so I guess it's a bug in Terminal.app?
Does anyone have similar experience?
This isn't a bug; it's actually a feature. Terminal.app, like many other terminal emulators, saves recent output in a buffer so that you can scroll back (with page up or the scroll bar). You can limit how large this is by going to Terminal -> Preferences -> Settings and setting the scrollback limit to something other than Unlimited.
It's not Python that is leaking memory. Look closer. On my machine, the Python process remains at a quiet, stable 3.5 MB of memory.
The memory usage increment you see is most likely due to the Terminal not ever discarding output. You can alter this behavior going to Preferences, Settings, and setting the maximum line number to something else than "Unlimited".

How can I profile a multithreaded program?

I have a program that is performing waaaay under par, and I would like to profile it. However, it is multithreaded, so I can't seem to find a good way to profile this thing. Any advice?
I've tried yappi, but it segfaults on OS X :(
EDIT: This is in python, sorry for putting it under profiling...
Are you multithreading or multiprocessing? If you are just multithreading, then that is the problem. Python currently has problems with multithreading on a multiprocessor system because of the Global Interpreter Lock (GIL). They are working on fixing it for Python 3.2 - at least so that your program will run as fast on a single core as on multiple cores.
If you aren't convinced take a look at the shootout results for the thread-ring program. Running with a single core is faster than running with quad cores.
Now, if you use multiprocessing instead, profiling can be difficult as well because then you have to run CProfiler from each separate process. There are some questions that point you in the right direction though.
Depending on how far you've come in your troubleshooting, there are some tools that might point you in the right direction.
"top" is a helpful start to show you if your problem is burning CPU time or simply waiting for stuff.
"dtruss -c" can show you where you spend time and what system calls takes most of your time.
Both these can give you a hint without knowing anything about python.
If you just want to use yappi, it isn't too much work to set up a virtualbox and install some sort of Linux on your machine. I find myself doing that from time to time when I want to try something.
There might of course be things I don't know about that makes it impossible or not worth the effort. Also, profiling on another OS running virtualized might not give the exact same results, but it might still be helpful.

Categories

Resources