MPI error - channel initialization failed - python

I am running an easy Fortran 90 program (Hello world) with MPI.
The compilation works fine, if I run ./a.out I get the message, but as soon as I run
$mpiexec -n 1 exec ./a.out
I get two errors:
channel initialization failed
and
gethostbyname failed
Here is a screenshot. If I run a similar Python script what I get is this error.

Related

MPI/python application trace

I have an MPI python application and I tried to trace it using intel trace analyzer and collector. I am running my application on a HPC, I tried to get the .stf trace files to be used as input to intel trace analyzer using this link: https://www.intel.com/content/www/us/en/develop/documentation/get-started-with-itac/top/trace-your-mpi-application.html and this command on linux; mpiexec -trace -n 2 python3 ./run.py,
but I am getting the following error:
Error: mpiexec: Error: unknown option "-trace"
When I run only mpiexec -n 2 python3 ./run.py, it works, but when I added the trace option, it doesn't work, ..
Thank you in advance.

how to solve error in running MPI in amazon EC2?

I have written a code using mpi4py and I want to run it on an ec2 instance. when I use this command:
mpiexec -n 2 python -m mpi4py ./MyCode/test4.py
I face this error:
[mpiexec#ip-172-31-18-30] HYDU_create_process (utils/launch/launch.c:74): execvp error on file hydra_pmi_proxy (No such file or directory)
[mpiexec#ip-172-31-18-30] HYD_pmcd_pmiserv_proxy_init_cb (pm/pmiserv/pmiserv_cb.c:448): assert (!closed) failed
[mpiexec#ip-172-31-18-30] HYDT_dmxu_poll_wait_for_event (tools/demux/demux_poll.c:77): callback returned error status
[mpiexec#ip-172-31-18-30] HYD_pmci_wait_for_completion (pm/pmiserv/pmiserv_pmci.c:196): error waiting for event
[mpiexec#ip-172-31-18-30] main (ui/mpich/mpiexec.c:336): process manager error waiting for completion
Does anyone have any idea how to solve it?

mpi4py only works under mpiexec

I have set up mpi4py on a new server, and it isn't quite working. When I import mpi4py.MPI, it crashes. However, if I do the same thing under mpiexec, it works. On my other server and on my workstation, both techniques work fine. What am I missing on the new server?
Here's what happens on the new server:
$ python -c 'from mpi4py import MPI; print("OK")'
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems. This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):
PMI2_Job_GetId failed failed
--> Returned value (null) (14) instead of ORTE_SUCCESS
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems. This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):
orte_ess_init failed
--> Returned value (null) (14) instead of ORTE_SUCCESS
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems. This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):
ompi_mpi_init: ompi_rte_init failed
--> Returned "(null)" (14) instead of "Success" (0)
--------------------------------------------------------------------------
*** An error occurred in MPI_Init_thread
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
*** and potentially your MPI job)
[Octomore:45430] Local abort before MPI_INIT completed successfully; not able to aggregate error messages, and not able to guarantee that all other processes were killed!
If I run it with mpiexec, it's fine.
$ mpiexec -np 1 python -c 'from mpi4py import MPI; print("OK")'
OK
I'm running on CentOS 6.7. I've installed Python 2.7 as a software collection, and I've loaded the openmpi/gnu/1.10.2 module. MPICH and MPICH2 are also installed, so they may be conflicting with OpenMPI. I haven't loaded the MPICH modules, though. I'm running Python in a virtualenv:
$ pip list
mpi4py (2.0.0)
pip (8.1.2)
setuptools (18.0.1)
wheel (0.24.0)
It turned out that mpi4py is not compatible with version 1.10.2 of OpenMPI. It works fine with version 1.6.5.
$ module load openmpi/gnu/1.6.5
$ python -c 'from mpi4py import MPI; print("OK")'
OK

Python on Windows calling 'make -j4' via subprocess hangs, -j1 works

While porting a build script to Windows I noticed that I can not call make with the parallel build option -j from within python:
subprocess.call("make -j4 -f Makefile.win32 target".split())
This is on a Windows 7 (in a VM), with Python 3.4.2 (or 2.7.8), GNU Make 3.81 from MinGW. Make itself calls cl.exe to compile about 40 C-files. When killing the offending make.exe in the Taskmanager, this is the output:
make: *** [target] Error 1
make: INTERNAL: Exiting with 1 jobserver tokens available; should be 4!

Installing node.js on Windows cygwin error

I tried to install node.js by following instructions here:
https://github.com/joyent/node/wiki/Building-node.js-on-Cygwin-(Windows)
I got these errors after running ./configure:
$ ./configure
Checking for program g++ or c++ : /usr/bin/g++
2 [main] python 6768 C:\cygwin\bin\python.exe: *** fatal error - unable to remap \\?\C:\cygwin\lib\python2.6\lib-dynload\time.dll to same address as parent: 0x3A0000 != 0x3D0000
Stack trace:
Frame Function Args
002891E8 6102796B (002891E8, 00000000, 00000000, 00000000)
002894D8 6102796B (6117EC60, 00008000, 00000000, 61180977)
0028A508 61004F1B (611A7FAC, 61249144, 003A0000, 003D0000)
End of stack trace
3 [main] python 5292 fork: child 6768 - died waiting for dll loading, errno 11
/home/user/node/wscript:228: error: could not configure a cxx compiler!
I did a rebaseall on cygwin and ran configure again and got these errors:
$ ./configure
Checking for program g++ or c++ : /usr/bin/g++
Checking for program cpp : /usr/bin/cpp
Checking for program ar : /usr/bin/ar
Checking for program ranlib : /usr/bin/ranlib
Checking for g++ : ok
Checking for program gcc or cc : /usr/bin/gcc
2 [main] python 6100 C:\cygwin\bin\python.exe: *** fatal error - unable to remap \\?\C:\cygwin\lib\python2.6\lib-dynload\_functools.dll to same address as parent: 0x3A0000 != 0x3D0000
Stack trace:
Frame Function Args
002891E8 6102796B (002891E8, 00000000, 00000000, 00000000)
002894D8 6102796B (6117EC60, 00008000, 00000000, 61180977)
0028A508 61004F1B (611A7FAC, 6124976C, 003A0000, 003D0000)
End of stack trace
2 [main] python 4424 fork: child 6100 - died waiting for dll loading, errno 11
/home/user/node/wscript:230: error: could not configure a c compiler!
What am I doing wrong?
You can follow the steps on this page:
http://www.garethhunt.com/2008/02/11/cygwin-died-waiting-for-dll-loading/
it worked perfectly for me.
You can avoid installing node.js through Cygwin and try to use native Windows executable which would/should be preffered (unless you have some specific reasons to do otherwise).

Categories

Resources