How to get around memory allocation error in python nosetest? - python

I have a python script to allocate a huge memory space and eventually ended up overflow. Is there anyway nosetests can peacefully handle this?

Unfortunately the only way to survive such a thing would be to have your test fixture run that particular test in a subcommand using subprocess.Popen(), and capture its output and error code so that you can see the nonzero error code and “out of memory” traceback that result. Note that sys.executable is the full path to the current Python executable, if that helps you build a Popen() command line to run Python on the little test script that runs out of memory.
Once a process it out of memory, there is typically no way to recover, because nearly anything it might try to do — format a string to print out, for example — takes even more memory which is, by definition, now exhausted. :)

Related

Is there no way to parallelize calls to an executable in Windows

I've got a list of files for each of which I'm calling sox. Because it takes a while I thought I'd speed the process up by parallelizing it, each call to sox is independent of each other so I thought it'd be a simple thing.
But it seems you cannot call the same executable from a different process, as that leads to an The process cannot access the file because it is being used by another process. error.
I'm guessing that is the cause because there's no other file I'm using across different processes. And yet I'm quite surprised by this, why would RO access not be possible? And does that really mean there's absolutely no way for me to speed my program up?
Found the error. I had at the end of my sox command 2> $nul to suppress the output. That was of course causing issues. :D

Does executing a python script load it into memory?

I'm running a python script using python3 myscript.py on Ubuntu 16.04. Is the script loaded into memory or read and interpreted line by line from the hdd? If it's not loaded all at once, is there any way of knowing or controlling how big the chunks are, that are loaded into Memory?
It is loaded into memory in its entirety. This must be the case, because a syntax error near the end will abort the program straight away. Try it and see.
There does not need to be any way to control or configure this. It is surely an implementation detail best left alone. If you have a problem related to this (e.g. your script is larger than your RAM), it can be solved some other way.
The "script" you use is only the human friendly representation you see. Python opens that script, reads lines, tokenizes them, creates a parse and ast tree for it and then emits bytecode which you can see using the dis module.
The "script" isn't loaded, it's code object (the object that contains the instructions generated for it) is. There's no direct way to affect that process. I have never heard of a script being so big that you need to read it in chunks, I'd be surprised if you accomplished it.

Python Code Coverage and Multiprocessing

I use coveralls in combination with coverage.py to track python code coverage of my testing scripts. I use the following commands:
coverage run --parallel-mode --source=mysource --omit=*/stuff/idont/need.py ./mysource/tests/run_all_tests.py
coverage combine
coveralls --verbose
This works quite nicely with the exception of multiprocessing. Code executed by worker pools or child processes is not tracked.
Is there a possibility to also track multiprocessing code? Any particular option I am missing? Maybe adding wrappers to the multiprocessing library to start coverage every time a new process is spawned?
EDIT:
I (and jonrsharpe, also :-) found a monkey-patch for multiprocessing.
However, this does not work for me, my Tracis-CI build is killed almost right after the start. I checked the problem on my local machine and apparently adding the patch to multiprocessing busts my memory. Tests that take much less than 1GB of memory need more than 16GB with this fix.
EDIT2:
The monkey-patch does work after a small modification: Removing
the config_file parsing (config_file=os.environ['COVERAGE_PROCESS_START']) did the trick. This solved the issue of the bloated memory. Accordingly, the corresponding line simply becomes:
cov = coverage(data_suffix=True)
Coverage 4.0 includes a command-line option --concurrency=multiprocessing to deal with this. You must use coverage combine afterward. For instance, if your tests are in regression_tests.py, then you would simply do this at the command line:
coverage run --concurrency=multiprocessing regression_tests.py
coverage combine
I've had spent some time trying to make sure coverage works with multiprocessing.Pool, but it never worked.
I have finally made a fix that makes it work - would be happy if someone directed me if I am doing something wrong.
https://gist.github.com/andreycizov/ee59806a3ac6955c127e511c5e84d2b6
One of the possible causes of missing coverage data from forked processes, even with concurrency=multiprocessing, is the way of multiprocessing.Pool shutdown. For example, with statement leads to terminate() call (see __exit__ here). As a consequence, pool workers have no time to save coverage data. I had to use close(), timed join() (in a thread), terminate sequence instead of with to get coverage results saved.

Python Nosetests and Sniffer: Viewing leftover state

I'm working on a project that uses Nosetests and the Sniffer autotester, and I ran into an odd occurance with Sniffer. When I first run Sniffer, one of my routes tests will pass as expected, but on every subsequent time it runs (as a result of saving a file), that test will fail generating an
OverflowError: Maximum recursion level reached.
My project uses Pandas, and it appears to bomb out inside a to_json function, but since it's only occurring when Sniffer makes a second run, I feel the issue isn't with Pandas, but Sniffer. I cannot reproduce the error when running Nose via the nosetests command, which is more evidence the problem lies with the autotester.
Is there a good way to debug such a problem? It seems as if there's some leftover state from the first time Sniffer runs, and that's tripping it up on the second run, but as I'm not quite sure how to debug this I can't be sure. Any ideas? I can provide code if necessary but it's more of a general how do I debug this kind of question.
Try running your nosetests with --pdb It will get you a debugger prompt on OverflowError and you'd be able to see the stack trace of calls.

How to access a data structure from a currently running Python process on Linux?

I have a long-running Python process that is generating more data than I planned for. My results are stored in a list that will be serialized (pickled) and written to disk when the program completes -- if it gets that far. But at this rate, it's more likely that the list will exhaust all 1+ GB free RAM and the process will crash, losing all my results in the process.
I plan to modify my script to write results to disk periodically, but I'd like to save the results of the currently-running process if possible. Is there some way I can grab an in-memory data structure from a running process and write it to disk?
I found code.interact(), but since I don't have this hook in my code already, it doesn't seem useful to me (Method to peek at a Python program running right now).
I'm running Python 2.5 on Fedora 8. Any thoughts?
Thanks a lot.
Shahin
There is not much you can do for a running program. The only thing I can think of is to attach the gdb debugger, stop the process and examine the memory. Alternatively make sure that your system is set up to save core dumps then kill the process with kill --sigsegv <pid>. You should then be able to open the core dump with gdb and examine it at your leisure.
There are some gdb macros that will let you examine python data structures and execute python code from within gdb, but for these to work you need to have compiled python with debug symbols enabled and I doubt that is your case. Creating a core dump first then recompiling python with symbols will NOT work, since all the addresses will have changed from the values in the dump.
Here are some links for introspecting python from gdb:
http://wiki.python.org/moin/DebuggingWithGdb
http://chrismiles.livejournal.com/20226.html
or google for 'python gdb'
N.B. to set linux to create coredumps use the ulimit command.
ulimit -a will show you what the current limits are set to.
ulimit -c unlimited will enable core dumps of any size.
While certainly not very pretty you could try to access data of your process through the proc filesystem.. /proc/[pid-of-your-process]. The proc filesystem stores a lot of per process information such as currently open file pointers, memory maps and what not. With a bit of digging you might be able to access the data you need though.
Still i suspect you should rather look at this from within python and do some runtime logging&debugging.
+1 Very interesting question.
I don't know how well this might work for you (especially since I don't know if you'll reuse the pickled list in the program), but I would suggest this: as you write to disk, print out the list to STDOUT. When you run your python script (I'm guessing also from command line), redirect the output to append to a file like so:
python myScript.py >> logFile.
This should store all the lists in logFile.
This way, you can always take a look at what's in logFile and you should have the most up to date data structures in there (depending on where you call print).
Hope this helps
This answer has info on attaching gdb to a python process, with macros that will get you into a pdb session in that process. I haven't tried it myself but it got 20 votes. Sounds like you might end up hanging the app, but also seems to be worth the risk in your case.

Categories

Resources