I am trying to to make use of multiprocessing across several different computers, which pathos seems geared towards: "Pathos is a framework for heterogenous computing. It primarily provides the communication mechanisms for configuring and launching parallel computations across heterogenous resources." In looking at the documentation, however, I am at a loss as to how to get a cluster up and running. I am looking to:
Set up a remote server or set of remote servers with secure authentication.
Securely connect the the remote server(s).
Map a task across all CPUs in both the remote servers and my local machine using a straightforward API like pool.map in the standard multiprocessing package (like the pseudocode in this related question).
I do not see an example for (1) and I do not understand the tunnel example provided for (2). The example does not actually connect to an existing service on the localhost. I would also like to know if/how I can require this communication to come with a password/key of some kind that would prevent someone else from connecting to the server. I understand this uses SSH authentication, but absent a preexisting key that only insures that the traffic is not read as it passes over the Internet, but does nothing to prevent someone else from hijacking the server.
I'm the pathos author. Basically, for (1) you can use pathos.pp to connect to another computer through a socket connection. pathos.pp has almost exactly the same API as pathos.multiprocessing, although with pathos.pp you can give the address and port of a remote host to connect to, using the keyword servers when setting up the Pool.
However, if you want to make a secure connection with SSH, it's best to establish a SSH-tunnel connection (as in the example you linked to), and then pass localhost and the local port number to the servers keyword in Pool. This will then connect to the remote pp-worker through the ssh tunnel. See:
https://github.com/uqfoundation/pathos/blob/master/examples/test_ppmap2.py and
http://www.cacr.caltech.edu/~mmckerns/pathos.html
Lastly, if you are using pathos.pp with a remote server, as above, you should be already doing (3). However, it can be more efficient (for an embarrassingly parallel enough set of jobs), that you nest the parallel maps… so first use pathos.pp.ParallelPythonPool to build a parallel map across servers, then call a N-way job using a parallel map in pathos.multiprocessing.ProcessingPool inside the function you are mapping with pathos.pp. This will minimize the communication across the remote connection.
Also, you don't need to give a SSH password, if you have ssh-agent working for you. See: http://mah.everybody.org/docs/ssh. Pathos assumes for parallel maps across remote servers, you will have ssh-agent working and you won't need to type your password every time there's a connection.
EDIT: added example code on your question here: Python Multiprocessing with Distributed Cluster
Related
I would like to have a server process (preferably Python) that accepts simple messages and multiple clients (again, preferably Python) that connect to the server and send messages to it. The server and clients will only ever be running on the same local machine and the OS is Linux based. The server will be automatically started by the OS and the clients started later independent of the server. I strongly want to avoid installing a whole separate messaging framework/server to do this. The messages will be simple strings such as "kick" or even just a single byte representing the message type. It also needs to know when a connection is made and lost.
From these requirements, I think named pipes would be a feasible solution, with a new instance of that pipe created for each client connection. However, when I search for examples, all of the ones I have come across deal with processes that are spawned from the same parent process and not independently started which means they can pass a parent reference to the child.
Windows seems to allow multiple instances of a named pipe (one for each client connection), but I'm unsure if this is possible on a Linux based OS?
Please could someone point me in the right direction, preferably with a basic example, even if it's just pseudo-code.
I've looked at the multiprocessing module in Python, but this seems to be oriented around the server and client sharing the same process or having one spawn the other.
Edit
May be important, the host device is not guaranteed to have networking capabilities (embedded device).
I've used zeromq for this sort of thing before. it's a relatively lightweight library that exposes this sort of functionality
otherwise, you could implement it yourself by binding a socket in the server process and having clients connect to it. this works fine for unix domain sockets, just pass AF_UNIX when creating the socket, e.g:
import socket
with socket.socket(socket.AF_UNIX) as s:
s.bind('/tmp/srv')
s.listen(1)
(c, addr) = s.accept()
with c:
c.send(b"hello world")
for the server, and:
with socket.socket(socket.AF_UNIX) as c:
c.connect('/tmp/srv')
print(c.recv(8192))
for the client.
writing a protocol around this is more involved, which is where things like zmq really help where you can easily push JSON messages around
I am developing one application using heroku, but struggling with one issue.
In this application, I have 2 dynos (one is for server, and the other is for client).
Since I want to get some data from server, my client needs to know IP address of the server(dyno).
Now I am trying to use Fixie and QuotaGuard Static,
They tell me an IP address, but I can not connect to the server using these IP address.
Could you tell me how to fix it?
You want to have two dynos communicate directly over a socket connection. Unfortunately, you can't easily do that; that runs counter to the ethos of Heroku and 12-factor application design (http://12factor.net), which specifies that processes should be isolated from each other, and that communication be via "network attached services". That second point may seem like a nuance, but it affects how the dynos discover the other services (via injected environment variables).
There are many reasons for this constraint, not the least of which is the fact that "dynos", as a unit of compute, may be scaled, migrated to different physical servers, etc., many times over an application's lifecycle. Trying to connect to a socket on a dyno reliably would actually get pretty complicated (selecting the right one if multiple are running, renegotiating connections after scaling/migration events, etc.). Remember - even if you are never going to call heroku ps:scale client=2, Heroku doesn't know that and, as a platform, it is designed to assume that you will.
The solution is to use an intermediate service like Redis to facilitate the inter-process communication via a framework like Python RQ or similar.
Alternatively, treat the two dynos as separate applications - then you can connect from one to the other via HTTP using the publicly available DNS entry for that application. Note - in that case, it would still be possible to share a database if that's required.
Hope that helps.
I'm running a client which makes remote calls to my server, both written with twisted. The methods running on the server side can be quite long to return, and they're eating mostly CPU in Python code so threading won't be of any help here.
I've tried a lot of stuff but eventually I think I'm going to run several instances of twisted servers to distribute the tasks.
So I'm telling my servers to listen on several sockets (let's say I create them using serverFromString on socket_1 for server 1 and on socket_2 for server 2), and I'm connecting my client on these sockets with 2 calls to connectUNIX with socket_1 and socket_2 as arguments.
So far I managed to create the servers listening on the ports I want them to, but I'm not sure how to tell my client to distribute callRemote across the sockets. When I compute several callRemote it seems that only one server is actually being used. How do I do that ?
P.S. : I tried using multiprocessing but my methods on the server side are full of unpicklable objects so no chance ; also the API of spawnProcess is utterly non compliant with the code I'm calling. Also I'm not willing to use an undocumented unmaintained project so Ampoule is not an option here.
Edit : No answer so I guess the question wasn't clear enough. Basically it all boils down to : can I pass a 'port' or 'socket' argument to callRemote so I can manage on which server I'm running the remote calls ?
Basically I have a django/uwsgi client in which I call reactor.connectUNIX(my_port). I redirect all the calls made in my python code to a threads.blockingCallFromThread(reactor, callRemote, args). The remote calls go to my_port, though I don't really know where/how the argument is passed. On the server side the application is launched with twisted.scripts._twistd_unix.UnixApplicationRunner, and the server listens on my_port
I'd like to start several servers on different addresses and have my client dispatch the remote calls among the servers. I don't know if I'm clear yet, I would gladly add more precisions.
I'm working on a project to expose a set of methods from various client machines to a server for the purpose of information gathering and automation. I'm using Python at the moment, and SimpleXMLRPCServer seems to work great on a local network, where I know the addresses of the client machines, and there's no NAT or firewall.
The problem is that the client/server model is backwards for what I want to do. Rather than have an RPC server running on the client machine, exposing a service to the software client, I'd like to have a server listening for connections from clients, which connect and expose the service to the server.
I'd thought about tunneling, remote port forwarding with SSH, or a VPN, but those options don't scale well, and introduce more overhead and complexity than I'd like.
I'm thinking I could write a server and client to reverse the model, but I don't want to reinvent the wheel if it already exists. It seems to me that this would be a common enough problem that there would be a solution for it already.
I'm also just cutting my teeth on Python and networked services, so it's possible I'm asking the wrong question entirely.
What you want is probably WAMP routed RPC.
It seems to address your issue and it's very convenient once you get used to it.
The idea is to put the WAMP router (let's say) in the cloud, and both RPC caller and RPC callee are clients with outbound connections to the router.
I was also using VPN for connecting IoT devices together through the internet, but switching to this router model really simplified things up and it scales pretty well.
By the way WAMP is implemented in different languages, including Python.
Maybe Pyro can be of use? It allows for many forms of distributed computing in Python. You are not very clear in your requirements so it is hard to say if this might work for you, but I advise you to have a look at the documentation or the many examples of Pyro and see if there's something that matches what you want to do.
Pyro abstracts most of the networking intricacy away, you simply invoke a method on a (remote) python object.
I want to write a Python script that will check the users local network for other instances of the script currently running.
For the purposes of this question, let's say that I'm writing an application that runs solely via the command line, and will just update the screen when another instance of the application is "found" on the local network. Sample output below:
$ python question.py
Thanks for running ThisApp! You are 192.168.1.101.
Found 192.168.1.102 running this application.
Found 192.168.1.104 running this application.
What libraries/projects exist to help facilitate something like this?
One of the ways to do this would be the Application under question is broadcasting UDP packets and your application is receiving that from different nodes and then displaying it. Twisted Networking Framework provides facilities for doing such a job. The documentation provides some simple examples too.
Well, you could write something using the socket module. You would have to have two programs though, a server on the users local computer, and then a client program that would interface with the server. The server would also use the select module to listen for multiple connections. You would then have a client program that sends something to the server when it is run, or whenever you want it to. The server could then print out which connections it is maintaining, including the details such as IP address.
This is documented extremely well at this link, more so than you need but it will explain it to you as it did to me. http://ilab.cs.byu.edu/python/
You can try broadcast UDP, I found some example here: http://vizible.wordpress.com/2009/01/31/python-broadcast-udp/
You can have a server-based solution: a central server where clients register themselves, and query for other clients being registered. A server framework like Twisted can help here.
In a peer-to-peer setting, push technologies like UDP broadcasts can be used, where each client is putting out a heartbeat packet ever so often on the network, for others to receive. Basic modules like socket would help with that.
Alternatively, you could go for a pull approach, where the interesting peer would need to discover the others actively. This is probably the least straight-forward. For one, you need to scan the network, i.e. find out which IPs belong to the local network and go through them. Then you would need to contact each IP in turn. If your program opens a TCP port, you could try to connect to this and find out your program is running there. If you want your program to be completely ignorant of these queries, you might need to open an ssh connection to the remote IP and scan the process list for your program. All this might involve various modules and libraries. One you might want to look at is execnet.