I've got a list of files for each of which I'm calling sox. Because it takes a while I thought I'd speed the process up by parallelizing it, each call to sox is independent of each other so I thought it'd be a simple thing.
But it seems you cannot call the same executable from a different process, as that leads to an The process cannot access the file because it is being used by another process. error.
I'm guessing that is the cause because there's no other file I'm using across different processes. And yet I'm quite surprised by this, why would RO access not be possible? And does that really mean there's absolutely no way for me to speed my program up?
Found the error. I had at the end of my sox command 2> $nul to suppress the output. That was of course causing issues. :D
Related
having a strange issue when trying to run a simple "hello world" program with MPI.
I eventually want to use 100 processes for this MPI script I'm writing in python and was even able to run the hello world test earlier with up to 100 processes. However, now I keep encountering the same error when I try to run the script with ~50 processes.
The specific error I see seems to be stating:
ORTE_ERROR_LOG: The system limit on number of network connections a process can open was reached in file util/listener.c at line 321
After trying to research this, I understand that it has something to do with a process running out of file descriptors and it seems like the most common solutions state that a file is not closing properly. However, my issue here is, I'm not opening any files? My script is just:
print('I am process:', rank)
So what could the issue be stemming from here?
I seem to have found a slight workaround.
I am working on a Mac, so I'm assuming that earlier I was able to stay under my file limit that is at a certain default amount set by the OS. By configuring the max file limit, I was able to bypass the limit amount I was originally hitting, causing my program to crash.
This fix isn't ideal, since my script now takes quite a while to run, but it is at least a temporary one until I can find a better fix.
If anyone would like to attempt this, the solution I found was posted by #tombigel on GitHub and can be found here.
I am writing an OpenMDAO problem that calls a group of external codes in a parallel group. One of these external codes is a PETSc-based fortran FEM code. I realize this is potentially problematic since OpenMDAO also utilizes PETSc. At the moment, I'm calling the external code in a component using python's subprocess.
If I run my OpenMDAO problem in serial (i.e. python2.7 omdao_problem.py), everything, including the external code, works just fine. When I try to run it in parallel, however (i.e. mpirun -np 4 python2.7 omdao_problem.py) then it works up until the subprocess call, at which point I get the error:
*** Process received signal ***
Signal: Segmentation fault: 11 (11)
Signal code: Address not mapped (1)
Failing at address: 0xe3c00
[ 0] 0 libsystem_platform.dylib 0x00007fff94cb652a _sigtramp + 26
[ 1] 0 libopen-pal.20.dylib 0x00000001031360c5 opal_timer_darwin_bias + 15469
*** End of error message ***
I can't make much of this, but it seems reasonable to me that the problem would come from using an MPI-based python code to call another MPI-enabled code. I've tried using a non-mpi "hello world" executable in the external code's place and that can be called by the parallel OpenMDAO code without error. I do not need the external code to actually run in parallel, but I do need to use the PETSc solvers and such, hence the inherent reliance on MPI. (I guess I could consider having both an MPI-enabled and non-MPI-enabled build of PETSc laying around? Would prefer not to do that if possible as I can see that becoming a mess in a hurry.)
I found this discussion which appears to present a similar issue (and further states that using subprocess in an MPI code, as I'm doing, is a no-no). In that case, it looks like using MPI_Comm_spawn may be an option, even though it isn't intended for that use. Any idea if that would work in the context of OpenMDAO? Other avenues to pursue for getting this to work? Any thoughts or suggestions are greatly appreciated.
You don't need to call the external code as a sub-process. Wrap the fortran code in python using F2py and pass a comm object down into it. This docs example shows how to work with components that use a comm.
You could use an MPI spawn if you want to. This approach has been done, but its far from ideal. You will be much more efficient if you can wrap the code in memory and let OpenMDAO pass you a comm.
Sometimes a Python program stops with an exception like the following, when there is not enough memory:
OSError: [Errno 12] Cannot allocate memory
Can I make it wait until memory is available again instead of dying unrecoverably?
Or at least freeze until the user sends a SIGCONT or something to it?
It's not my program, so I prefer not to modify its source code, but I think it would still be cool if I can do that by modifying only the outmost calling part.
Thank you!
You can catch the OSError exception but this may not help your program to continue where it left off.
To do this well you need to interpose some code between Python and malloc. You can do that using LD_PRELOAD as per the details here: How can I limit memory acquired with `malloc()` without also limiting stack?
The idea is you implement a wrapper for malloc which calls the real malloc and waits to retry if it fails. If you prefer not to use LD_PRELOAD then building Python with your interposing code baked in is a possibility (but a bit more work).
The library you'll write for LD_PRELOAD will end up being usable with just about any program written in C or C++. You could even open-source it. :)
So I am working on a Matlab application that has to do some communication with a Python script. The script that is called is a simple client software. As a side note, if it would be possible to have a Matlab client and a Python server communicating this would solve this issue completely but I haven't found a way to work that out.
Anyhow, after searching the web I have found two ways to call Python scripts, either by the system() command or editing the perl.m file to call Python scripts instead. Both ways are too slow though (tic tocing them to > 20ms and must run faster than 6ms) as this call will be in a loop that is very time sensitive.
As a solution I figured I could instead save a file at a certain location and have my Python script continuously check for this file and when finding it executing the command I want it to. Now after timing each of these steps and summing them up I found it to be incredibly much faster (almost 100x so for sure fast enough) and I cant really believe that, or rather I cant understand why calling python scripts is so slow (not that I have more than a superficial knowledge of the subject). I also found this solution to be really messy and ugly so just wanted to check that, first, is it a good idea and second, is there a better one?
Finally, I realize that the Python time.time() and Matlab tic, toc might not be precise enough to measure time correctly on that scale so also a reason why I ask.
Spinning up new instances of the Python interpreter takes a while. If you spin up the interpreter once, and reuse it, this cost is paid only once, rather than for every run.
This is normal (expected) behaviour, since startup includes large numbers of allocations and imports. For example, on my machine, the startup time is:
$ time python -c 'import sys'
real 0m0.034s
user 0m0.022s
sys 0m0.011s
I have a python script to allocate a huge memory space and eventually ended up overflow. Is there anyway nosetests can peacefully handle this?
Unfortunately the only way to survive such a thing would be to have your test fixture run that particular test in a subcommand using subprocess.Popen(), and capture its output and error code so that you can see the nonzero error code and “out of memory” traceback that result. Note that sys.executable is the full path to the current Python executable, if that helps you build a Popen() command line to run Python on the little test script that runs out of memory.
Once a process it out of memory, there is typically no way to recover, because nearly anything it might try to do — format a string to print out, for example — takes even more memory which is, by definition, now exhausted. :)