I have set up mpi4py on a new server, and it isn't quite working. When I import mpi4py.MPI, it crashes. However, if I do the same thing under mpiexec, it works. On my other server and on my workstation, both techniques work fine. What am I missing on the new server?
Here's what happens on the new server:
$ python -c 'from mpi4py import MPI; print("OK")'
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems. This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):
PMI2_Job_GetId failed failed
--> Returned value (null) (14) instead of ORTE_SUCCESS
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems. This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):
orte_ess_init failed
--> Returned value (null) (14) instead of ORTE_SUCCESS
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems. This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):
ompi_mpi_init: ompi_rte_init failed
--> Returned "(null)" (14) instead of "Success" (0)
--------------------------------------------------------------------------
*** An error occurred in MPI_Init_thread
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
*** and potentially your MPI job)
[Octomore:45430] Local abort before MPI_INIT completed successfully; not able to aggregate error messages, and not able to guarantee that all other processes were killed!
If I run it with mpiexec, it's fine.
$ mpiexec -np 1 python -c 'from mpi4py import MPI; print("OK")'
OK
I'm running on CentOS 6.7. I've installed Python 2.7 as a software collection, and I've loaded the openmpi/gnu/1.10.2 module. MPICH and MPICH2 are also installed, so they may be conflicting with OpenMPI. I haven't loaded the MPICH modules, though. I'm running Python in a virtualenv:
$ pip list
mpi4py (2.0.0)
pip (8.1.2)
setuptools (18.0.1)
wheel (0.24.0)
It turned out that mpi4py is not compatible with version 1.10.2 of OpenMPI. It works fine with version 1.6.5.
$ module load openmpi/gnu/1.6.5
$ python -c 'from mpi4py import MPI; print("OK")'
OK
Related
I was trying to execute tests for my package over remote machine through ssh from my master node. Both nodes have same version of the packages installed.
I'm running test like this,
pytest -d --tx ssh=ubuntu//python=python3 --rsyncdir /home/ubuntu/pkg/ /home/ubuntu/pkg -n 7
on running this, I'm getting following error,
------------------------------ coverage ------------------------------
---------------------- coverage: failed workers ----------------------
The following workers failed to return coverage data, ensure that pytest-cov is installed on these workers.
gw0
gw1
gw2
gw3
gw4
gw5
gw6
Coverage XML written to file coverage.xml
I've made sure that coverage is installed in the worker node.
coverage==6.2
pytest-cov==3.0.0
I don't know why it is still failing.
I also noticed that the code files have not been synced in the worker machine for some reason.
I'm trying to understand what is going wrong here and how to fix this.
I tried to install theano and keras on Pydroid 3 (Android) which I was successfull but while running keras theano wasn't the backend for keras so I installed ubuntu 20 on termux and installed keras and theano with the following command:-
apt install python3-keras --no-install-recommends && apt install python3-theano --no-install-recommends
and it was successfully installed and when i wanted the backend stuff as theano I searched for ~/.keras/keras.json but it wasn't there so anyway I wanted to check it so I ran it gave me the following error:-
root#localhost~# python3 testkeras.py
[localhost:21091] opal_ifinit: ioctl(SIOCGIFHWADDR) failed with errno=13
[localhost:21092] opal_ifinit: ioctl(SIOCGIFHWADDR) failed with errno=13
[localhost:21092] pmix_ifinit: ioctl(SIOCGIFHWADDR) failed with errno=13
[localhost:21092] oob_tcp: problems getting address for index 88256 (kernel index -1)
-------------------------------------------------------------------------- No network interfaces were found for out-of-band communications. We require
at least one available network for out-of-band messaging.
--------------------------------------------------------------------------
[localhost:21091] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to start a daemon on the local node in file ess_singleton_module.c at line 716
[localhost:21091] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to start a daemon on the local node in file ess_singleton_module.c at line 172 -------------------------------------------------------------------------- It looks like orte_init failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems. This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):
orte_ess_init failed
--> Returned value Unable to start a daemon on the local node (-127) instead of ORTE_SUCCESS
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems. This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):
ompi_mpi_init: ompi_rte_init failed
--> Returned "Unable to start a daemon on the local node" (-127) instead of "Success" (0)
--------------------------------------------------------------------------
*** An error occurred in MPI_Init_thread
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
*** and potentially your MPI job)
[localhost:21091] Local abort before MPI_INIT completed completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!
1.) I want to know what was the problem .
2.) Suggestions are welcomed.
The Code that I ran if any of you want to know
import numpy as np
from keras.models import Sequential
from keras.layers import Dense, Dropout
And here is the modules I have installed :
Package Version
------------------- -------
decorator 4.4.2 h5py 2.10.0
Keras 2.2.4
Keras-Applications 1.0.6
Keras-Preprocessing 1.0.5
mpi4py 3.0.3
numpy 1.17.4
pip 20.0.2 PyYAML 5.3.1 scipy 1.3.3
setuptools 45.2.0
six 1.14.0
Theano 1.0.4
wheel 0.34.2
And I'am a new to this machine learning field
Some other information on the system
.-/+oossssoo+/-.
`:+ssssssssssssssssss+:`
-+ssssssssssssssssssyyssss+-
.ossssssssssssssssssdMMMNysssso.
/ssssssssssshdmmNNmmyNMMMMhssssss/ root#localhost
+ssssssssshmydMMMMMMMNddddyssssssss+ --------------
/sssssssshNMMMyhhyyyyhmNMMMNhssssssss/ OS: Ubuntu 20.04 LTS focal aarch64
.ssssssssdMMMNhsssssssssshNMMMdssssssss. Kernel: 4.4.147+
+sssshhhyNMMNyssssssssssssyNMMMysssssss+ Uptime: 18805 days, 10 hours, 9 min
ossyNMMMNyMMhsssssssssssssshmmmhssssssso Packages: 202 (dpkg)
ossyNMMMNyMMhsssssssssssssshmmmhssssssso Shell: bash 5.0.16
+sssshhhyNMMNyssssssssssssyNMMMysssssss+ Terminal: proot
.ssssssssdMMMNhsssssssssshNMMMdssssssss. CPU: Unisoc SC9863a (8) # 1.200GHz
/sssssssshNMMMyhhyyyyhdNMMMNhssssssss/ Memory: 957MiB / 1819MiB
+sssssssssdmydMMMMMMMMddddyssssssss+
/ssssssssssshdmNNNNmyNMMMMhssssss/
.ossssssssssssssssssdMMMNysssso.
-+sssssssssssssssssyyyssss+-
`:+ssssssssssssssssss+:`
.-/+oossssoo+/-.
Thank you Dr. Snoopy , I finally got it working correctly . But I had to delete the os and reinstalled it without any need by using the command apt install proot-distro from termux but I think the real problem was the command
apt install python3-keras --no-install-recommends as like you said there were some unsatisfied dependencies or platform inconsistencies.
Okay finally the steps to be done to get it working
1. Enter the following command in termux apt install proot-distro && proot-distro install ubuntu-18.04 && apt install python3-keras
2. And then use the favourite text editor to edit the file I use vim in this case vim .keras/keras.json
The file would be like the following:
{
"floatx": "float32",
"epsilon": 1e-07,
"backend": "tensorflow",
"image_data_format": "channels_last"
}
and change the value of backend to theano in my case I ad theano
the file should be like the following:
{
"floatx": "float32",
"epsilon": 1e-07,
"backend": "theano",
"image_data_format": "channels_last"
}
And then save the file.And test it by getting into python interactive mode and enter import keras
The output should be Using Theano as Backend
I think this should help someone out there.
i would like to use pygradle in a multi-project setup with project dependencies. I created two gradle sub-projects. A python-cli project (example-app) and a python-sdist project (example-lib) on which the cli project depends on.
But currently i'm facing the following error (gist) when i try to build the app:
multi-project-example/example-app> gradle build --info
> Task :example-app:buildPex FAILED
Task ':example-app:buildPex' is not up-to-date because:
Task has not declared any outputs despite executing actions.
Starting process 'command '/home/kkdh/Projects/pygradle/examples/multi-project-example/example-app/build/venv/bin/python''. Working directory: /home/kkdh/Projects/pygradle/examples/multi-project-example/example-app Command: /home/kkdh/Projects/pygradle/examples/multi-project-example/example-app/build/venv/bin/python /home/kkdh/Projects/pygradle/examples/multi-project-example/example-app/build/venv/bin/pip freeze --all --disable-pip-version-check
Successfully started process 'command '/home/kkdh/Projects/pygradle/examples/multi-project-example/example-app/build/venv/bin/python''
/home/kkdh/Projects/pygradle/examples/multi-project-example/example-app/build/deployable/bin/example-app.pex
Starting process 'command '/home/kkdh/Projects/pygradle/examples/multi-project-example/example-app/build/venv/bin/python''. Working directory: /home/kkdh/Projects/pygradle/examples/multi-project-example/example-app Command: /home/kkdh/Projects/pygradle/examples/multi-project-example/example-app/build/venv/bin/python /home/kkdh/Projects/pygradle/examples/multi-project-example/example-app/build/venv/bin/pex --no-pypi --cache-dir /home/kkdh/Projects/pygradle/examples/multi-project-example/example-app/build/pex-cache --output-file /home/kkdh/Projects/pygradle/examples/multi-project-example/example-app/build/deployable/bin/example-app.pex --repo /home/kkdh/Projects/pygradle/examples/multi-project-example/example-app/build/wheel-cache --python-shebang /home/kkdh/.anaconda3/bin/python UNKNOWN==0.0.0 example-app==0.3.0a1
Successfully started process 'command '/home/kkdh/Projects/pygradle/examples/multi-project-example/example-app/build/venv/bin/python''
Could not satisfy all requirements for example-lib:
example-lib(from: example-app==0.3.0a1)
:example-app:buildPex (Thread[Execution worker for ':',5,main]) completed. Took 1.165 secs.
You will find the example in my fork of pygradle: https://github.com/kKdH/pygradle/tree/master/examples/multi-project-example
I opened an issue about this problem but without any response from the project maintainers. So now i'am asking here for any pointers to a solution or further troubleshoot steps.
Our application is one of the few left running on DEA. On DEA we were able to use a specific custom buildbpack:
https://github.com/ihuston/python-conda-buildpack
Now that we have to move on Diego runtime, we run out of space while pushing the app. I believe the disk space is only required during staging, because quite a few libraries are coming with the buildpack and have to be built (we need the whole scientific python stack, which is all included in the above buildpack).
The build script outputs everything fine, except that the app cannot start. The logs then show:
2016-10-13T19:10:42.29+0200 [CELL/0] ERR Copying into the container failed: stream-in: nstar: error streaming in: exit status 2. Output: tar: ./app/.conda/pkgs/cache/db552c1e.json: Wrote only 8704 of 10240 bytes
and further many files:
2016-10-13T19:10:42.29+0200 [CELL/0] ERR tar: ./app/.conda/pkgs/cache/9779607c273dc0786bd972b4cb308b58.png: Cannot write: No space left on device
and then
2016-10-13T20:16:48.30+0200 [API/0] OUT App instance exited with guid b2f4a1be-aeda-44fa-87bc-9871f432062d payload: {"instance"=>"", "index"=>0, "reason"=>"CRASHED", "exit_description"=>"Copying into the container failed", "crash_count"=>14, "crash_timestamp"=>1476382608296511944, "version"=>"ca10412e-717a-413b-875a-535f8c3f7be4"}
When trying to add more disk quota (above 1G) there is an error:
Server error, status code: 400, error code: 100001, message: The app is invalid: disk_quota too much disk requested (must be less than 1024)
Is there a way to give a bit more space? At least for the build process?
You can use a .cfignore file just like a .gitignore file to exclude any unneeded files from being cf pushed. Maybe if you really only push what is necessary, the disk space could be sufficient.
https://docs.developer.swisscom.com/devguide/deploy-apps/prepare-to-deploy.html#exclude
The conda installer from https://github.com/ihuston/python-conda-buildpack installs by default with the Intel MKL library. Now this is usually a good thing, but seemingly uses too much space and thus cannot be deployed.
I adapted the buildpack and added to the line
$CONDA_BIN/conda install --yes --quiet --file "$BUILD_DIR/conda_requirements.txt"
The flag nomkl
$CONDA_BIN/conda install nomkl --yes --quiet --file "$BUILD_DIR/conda_requirements.txt"
As described in continuums blog post here:
https://www.continuum.io/blog/developer-blog/anaconda-25-release-now-mkl-optimizations
This will then use OpenBLAS instead and results in a much smaller droplet (175M instead of 330MB) and the deployment can successfully finish.
I'm attempting to run stratum-mining-proxy with minerd. Proxy starts and runs with the following command:
python ./mining_proxy.py -o ltc-stratum.kattare.com -p 3333 -pa scrypt
Proxy starts fine. Run Minerd (U/P removed):
minerd -a scrypt -r 1 -s 6 -o http://127.0.0.1:3333 -O USERNAME.1:PASSWORD
Following errors are received. This one from the proxy:
2013-07-18 01:33:59,981 ERROR protocol protocol.dataReceived # Processing of message failed
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/stratum-0.2.12-py2.7.egg/stratum/protocol.py", line 185, in dataReceived
self.lineReceived(line, request_counter)
File "/usr/local/lib/python2.7/dist-packages/stratum-0.2.12-py2.7.egg/stratum/protocol.py", line 216, in lineReceived
raise custom_exceptions.ProtocolException("Cannot decode message '%s'" % line)
'rotocolException: Cannot decode message 'POST / HTTP/1.1
And this from minerd. What am I doing wrong? Any help is appreciated!
[2013-07-18 01:33:59] HTTP request failed: Empty reply from server
[2013-07-18 01:33:59] json_rpc_call failed, retry after 30 seconds
I am a little curious, I don't know as a fact but I was under the impression that the mining proxy was for BTC not LTC.
But anyways I believe I got a similar message when I first installed it as well. To fix, or rather to actually get it running I had to use the Git installation method instead of installing manually.
Installation on Linux using Git
This is advanced option for experienced users, but give you the easiest way for updating the proxy.
1.git clone git://github.com/slush0/stratum-mining-proxy.git
2.cd stratum-mining-proxy
3.sudo apt-get install python-dev # Development package of Python are necessary
4.sudo python distribute_setup.py # This will upgrade setuptools package
5.sudo python setup.py develop # This will install required dependencies (namely Twisted and Stratum libraries), but don't install the package into the system.
6.You can start the proxy by typing "./mining_proxy.py" in the terminal window. Using default settings, proxy connects to Slush's pool interface.
7.If you want to connect to another pool or change other proxy settings, type "./mining_proxy.py --help".
8.If you want to update the proxy, type "git pull" in the package directory.