Host key verification failed using mpi4py

Host key verification failed using mpi4py - python

I am building a MPI application using mpi4py (1.3.1) and openmpi (1.8.6-1) in Arch Linux ARM (on a Raspberry Pi cluster, to be more specific). I've run my program successfully on 3 nodes (4 processes), and when trying to add a new node, here's what happens:
Host key verification failed.
--------------------------------------------------------------------------
ORTE was unable to reliably start one or more daemons.
This usually is caused by:
* not finding the required libraries and/or binaries on
one or more nodes. Please check your PATH and LD_LIBRARY_PATH
settings, or configure OMPI with --enable-orterun-prefix-by-default
* lack of authority to execute on one or more specified nodes.
Please verify your allocation and authorities.
* the inability to write startup files into /tmp (--tmpdir/orte_tmpdir_base).
Please check with your sys admin to determine the correct location to use.
* compilation of the orted with dynamic libraries when static are required
(e.g., on Cray). Please check your configure cmd line and consider using
one of the contrib/platform definitions for your system type.
* an inability to create a connection back to mpirun due to a
lack of common network interfaces and/or no route found between
them. Please check network connectivity (including firewalls
and network routing requirements).
The funny thing is, the ssh keys are fine, since I'm using the same nodes (I can remove any entry of the host file, add the new node, and it will work, so I am pretty sure that the problem is not with a misconfigured ssh setup. It only happens when I use 5 processes).
Could this be a bug in the library of some sort?
Here's my host file
192.168.1.26 slots=2
192.168.1.188 slots=1
#192.168.1.202 slots=1 If uncommented and run with -np 5, it will raise the error
192.168.1.100 slots=1
Thanks in advance!

I was having the same problem on a Linux x86_64 mini cluster running Fedora 22 and OpenMPI 1.8. I could SSH into any of my 5 machines from my launch machine, but when I tried to launch MPI with 3 or more nodes, it would give me an authentication error. And like you it seemed like 3 is a magic number, and it turns out that it is. OpenMPI uses a tree-based launch, and so when you have more than two nodes, one or more of the intermediate nodes are executing an ssh. In my case, I was not using a password-less setup. I had an SSH identity on the launch machine that I had added to my key chain. It was able to launch the first two nodes because I had that authenticated identity in my key chain. Then each of those nodes tried to launch more and those nodes did not have that key authenticated (I would have need to add it on each of them).
So the solution appears to be moving to a password-less SSH identity setup, but you obviously have to be careful how you do that. I created a specific identity (key pair) on my launch machine. I added the key to my authorized keys on the nodes I want to use (which is easy since they are all using NFS, but you could manually distribute the key once if you need to). Then I modified my SSH config to use that password-less identity when trying to go to my node machines. My ~/.ssh/config looks like:
Host node0
HostName node0
IdentityFile ~/.ssh/passwordless_rsa
Host node1
HostName node1
IdentityFile ~/.ssh/passwordless_rsa
...
I'm sure there is some way to scale this for N nodes with wildcards. Or you could consider changing the default identity file at the system level in system ssh config file (I bet a similar option is available there).
And that did the trick. Now I can spin up all 5 nodes without any authentication issues. The flaw in my thinking was that launch node would launch all the other nodes, but this tree based launch means you need to chain logins, which you cannot do with a passphrase authenticated identity since you never get the chance to authenticate it.
Having a password-less key still freaks me out, so to keep things extra safe on these nodes connected to an open network, I changed the sshd config (system level) to restrict logins to anyone except me coming from my launch node.

Related

Understanding smb and DCERPC for remote command execution capabilities

I'm trying to understand all the methods available to execute remote commands on Windows through the impacket scripts:
https://www.coresecurity.com/corelabs-research/open-source-tools/impacket
https://github.com/CoreSecurity/impacket
I understand the high level explanation of psexec.py and smbexec.py, how they create a service on the remote end and run commands through cmd.exe -c but I can't understand how can you create a service on a remote windows host through SMB. Wasn't smb supposed to be mainly for file transfers and printer sharing? Reading the source code I see in the notes that they use DCERPC to create this services, is this part of the smb protocol? All the resources on DCERPC i've found were kind of confusing, and not focused on its service creating capabilities. Looking at the sourcecode of atexec.py, it says that it interacts with the task scheduler service of the windows host, also through DCERPC. Can it be used to interact with all services running on the remote box?
Thanks!

DCERPC (https://en.wikipedia.org/wiki/DCE/RPC) : the initial protocol, which was used as a template for MSRPC (https://en.wikipedia.org/wiki/Microsoft_RPC).
MSRPC is a way to execute functions on the remote end and to transfer data (parameters to these functions). It is not a way to directly execute remote OS commands on the remote side.
SMB (https://en.wikipedia.org/wiki/Server_Message_Block ) is the file sharing protocol mainly used to access files on Windows file servers. In addition, it provides Named Pipes (https://msdn.microsoft.com/en-us/library/cc239733.aspx), a way to transfer data between a local process and a remote process.
One common way for MSRPC is to use it via Named Pipes over SMB, which has the advantage that the security layer provided by SMB is directly approached for MSRPC.
In fact, MSRPC is one of the most important, yet very less known protocols in the Windows world.
Neither MSRPC, nor SMB has something to do with remote execution of shell commands.
One common way to execute remote commands is:
Copy files (via SMB) to the remote side (Windows service EXE)
Create registry entries on the remote side (so that the copied Windows Service is installed and startable)
Start the Windows service.
The started Windows service can use any network protocol (e.g. MSRPC) to receive commands and to execute them.
After the work is done, the Windows service can be uninstalled (remove registry entries and delete the files).
In fact, this is what PSEXEC does.
All the resources on DCERPC i've found were kind of confusing, and not
focused on its service creating capabilities.
Yes, It’s just a remote procedure call protocol. But it can be used to start a procedure on the remote side, which can just do anything, e.g. creating a service.
Looking at the sourcecode of atexec.py, it says that it interacts with
the task scheduler service of the windows host, also through DCERPC.
Can it be used to interact with all services running on the remote
box?
There are some MSRPC commands which handle Task Scheduler, and others which handle generic service start and stop commands.
A few final words at the end:
SMB / CIFS and the protocols around are really complex and hard to understand. It seems ok trying to understand how to deal with e.g. remote service control, but this can be a very long journey.
Perhaps this page (which uses Java for trying to control Windows service) may also help understanding.
https://dev.c-ware.de/confluence/pages/viewpage.action?pageId=15007754

Openmpi with mpi4py don't work on multiple nodes

I have paralell python program written with mpi4py. I'm trying to make it distributed. I set up virtual machine, installed openmpi, openssh server, exchanged keys and all that. On local machine I have hostfile:
127.0.0.1 slots=4
192.168.1.104 slots=2
and i try to run program with:
mpirun -np 2 --hostfile hostfile python2 algen.py 0.85 0.02 20 70
but I get the following error:
[Kreutz:13090] tcp_peer_recv_connect_ack: invalid header type: 0
ORTE was unable to reliably start one or more daemons. This usually is
caused by:
not finding the required libraries and/or binaries on one or more nodes. Please check your PATH and LD_LIBRARY_PATH settings, or
configure OMPI with --enable-orterun-prefix-by-default
lack of authority to execute on one or more specified nodes. Please verify your allocation and authorities.
the inability to write startup files into /tmp (--tmpdir/orte_tmpdir_base). Please check with your sys admin to
determine the correct location to use.
compilation of the orted with dynamic libraries when static are required (e.g., on Cray). Please check your configure cmd line and
consider using one of the contrib/platform definitions for your
system type.
an inability to create a connection back to mpirun due to a lack of common network interfaces and/or no route found between them.
Please check network connectivity (including firewalls and network
routing requirements).
And i don't know what to do now. Do you have any ideas what could I try?

BlueZ-4.101 on embedded ARM device

I'm implementing Bluetooth on an embedded device and have a few questions about the BlueZ protocol stack. I'm using BlueZ-4.101 (do not have the option to upgrade to BlueZ-5), and do not have Python available.
Here are my questions after spending some time looking into BlueZ:
Is bluetoothd needed in my situation? As in, is it just a daemon that handles Python dbus messages between user-space and the kernel, or is it more? I've looked through the source and can only find mostly dbus related calls
How does one determine the value of DeviceID in /etc/bluetooth/main.conf? I found these instructions (section 3.4), but they are for a different platform using BlueZ 5
Will sdptool work without setting the DeviceID value? I've tried the following command and receive timeouts every time (only for my local device):
# sdptool browse local
Browsing FF:FF:FF:00:00:00 ...
Service Search failed: Connection timed out
Is it viable to replace all of the python simple-agent scripts with libbluetooth instead, or do I need to try and port them over to a supported scripting language?
Any help would be greatly appreciated!!!
If more logs are needed I can try and get them.

How to check from Linux in Python for administrative access to a Windows machine

I have a network of end-user machines (Windows, Linux, MacOS) and I want to check whether the credential I have allow me to access the machines as administrator (I am checking the "here are the admin credentials to the machines" vs. reality).
I wrote a Python script (it runs on Linux) which
runs nmap -O on the network to gather the hosts
tries to ssh with paramiko to check the Linux credentials.
I would like to do a similar check for the Windows machines. What would be a practical way, in Python, to do so?
I have a few sets of credentials (AD or local to a machine) so I would need a somehow universal method. I was thinking about something like a call to _winreg.ConnectRegistry but it does not import on my Linux (it does on a Windows box).

I am no sys-admin, but just trying to mount the C-drive ( \hostname\C$ ) via samba/smb should work. This assumes that remote sharing and filesystem access is enabled on that box and the firewall rule setup to allow for remote connections.

Python distributed application architecture on constrained corporate environment

This is my scenario: I developed a Python desktop application which I use to probe the status of services/DBs on the very same machine it is running on.
My need is to monitor, using my application, two "brother" Window Server 2003 hosts (Python version is 2.5 for both). One of the hosts lies in my own LAN, the other one lies in another LAN which is reachable via VPN.
The application is composed by:
A Graphical User Interface (gui.py), which provides widgets to collect user inputs and launches the...
...business-logic script (console.py), which in turn invokes slave Python scripts that check the system's services and DB usage/accounts status/ecc. The textual output of those checks is then returned back to the GUI.
I used to execute the application directly on each the two machines, but it would be great to turn it into a client/server application, so that:
users will just be supposed to run the gui.py locally
the gui.py will be supposed to communicate parameters to some server remakes of console.py which will be running on both of the Windows hosts
the servers will then execute system checks and report back the results to the client GUIs which will display them.
I thought about two possible solutions:
Create a Windows service on each of the Windows hosts, basically executing console.py's code and waiting for incoming requests from the clients
Open SSH connections from any LAN host to the eliged Windows host and directly run console.py on it.
I am working on a corporate environment, which has some network and host constraints: many network protocols (like SSH) are filtered by our corporate firewall. Furthermore, I don't have Administration privileges onto the Windows hosts, so I can't install system services on them...this is frustrating!
I just wanted to ask if there is any other way to make gui.py and console.py communicate over the network and which I did not take into account. Does anyone have any suggestion? Please note that - if possible - I'm not going to ask ICT department to give me Administration privileges on the Windows hosts!
Thanks in advance!

Answer to myself: I found one possible solution..
I'm lucky because the console.py script is actually invoking many slave python scripts, each of them performing one single system check via standard third-party command-line tools which can be fired to check features on remote hosts.
Then, what I did was to modify the gui.py and console.py so that users can parametrically specify on which Windows host the checks must be carried out.
In this way, I can obtain a ditributed application...but I've been lucky, what if one or more of the third-party CL tools did not support remote host features checking?

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.