multiprocessing and sockets. How to wait?

multiprocessing and sockets. How to wait? - python

I have a cluster with 4 nodes and a master server. The master dispatches jobs that may take from 30 seconds to 15 minutes to end.
The nodes are listening with a SocketServer.TCPServer and in the master, I open a connection and wait for the job to end.
def run(nodes, args):
pool = multiprocessing.Pool(len(nodes))
return pool.map(load_job, zip(nodes, args))
the load_job function sends the data with socket.sendall and right after that, it uses socket.recv (The data takes a long time to arrive).
The program runs fine until about 200 or 300 of theses jobs run. When it breaks, the socket.recv receives an empty string and cannot run any more jobs until I kill the node processes and run them again.
How should I wait for the data to come? Also, error handling in pool is very poor because it saves the error from another process and show without the proper traceback and this error is not so common to repeat...
EDIT:
Now I think this problem has nothing to do with sockets:
After some research, looks like my nodes are opening way to many processes (because they also run their jobs in a multiprocessing.Pool) and somehow they are not being closed!
I found these SO question (here and here) talking about zombie processes when using multiprocessing in a daemonized process (exactly my case!).
I'll need to further understand the problem, but for now I'm killing the nodes and restoring them after some time.

(I'm replying to the question before the edit, because I don't understand exactly what you meant in it).
socket.recv is not the best way to wait for data on a socket. The best way I know is to use the select module (documentation here). The simplest use when waiting for data on a single socket would be select.select([your_socket],[],[]), but it can certainly be used for more complex tasks as well.
Regarding the issue of socket.recv receives an empty string; When the socket is a TCP socket (as it is in your case), this means the socket has been closed by the peer.
Reasons for this may vary, but the important thing to understand is that after this happens, you will no longer receive any data from this socket, so the best thing you can do with it is close it (socket.close). If you don't expect it to close, this is where you should search for the problem.
Good luck!

Related

How to handle a burst of connection to a port?

I've built a server listening on a specific port on my server using Python (asyncore and sockets) and I was curious to know if there was anything possible to do when there is too many people connecting at once on my server.
The code in itself cannot be changed, but will adding more process works? or is it from an hardware perspective and I should focus on adding a load balancer in front and balancing the requests on multiple servers?
This questions is borderline StackOverflow (code/python) and ServerFault (server management). I decided to go with SO because of the code, but if you think ServerFault is better, let me know.

1.
asyncore relies on operating system for whole connection handling, therefore what you are asking is OS dependent. It has very little to do with Python. Using twisted instead of asyncore wouldn't solve your problem.
On Windows, for example, you can listen only for 5 connections coming in simultaneously.
So, first requirement is, run it on *nix platform.
The rest depends on how long your handlers are taking and on your bandwith.
2.
What you can do is combine asyncore and threading to speed-up waiting for next connection.
I.e. you can make Handlers that are running in separate threads. It will be a little messy but it is one of possible solutions.
When server accepts a connection, instead of creating new traditional handler (which would slow down checking for following connection - because asyncore waits until that handler does at least a little bit of its job), you create a handler that deals with read and write as non-blocking.
I.e. it starts a thread and does the job, then, when it has data ready, only then sends it upon following loop()'s check.
This way, you allow asyncore.loop() to check the server's socket more often.
3.
Or you can use two different socket_maps with two different asyncore.loop()s.
You use one map (dictionary), let say the default one - asyncore.socket_map to check the server, and use one asyncore.loop(), let say in main thread, only for server().
And you start the second asyncore.loop() in a thread using your custom dictionary for client handlers.
So, One loop is checking only server that accepts connections, and when it arrives, it creates a handler which goes in separate map for handlers, which is checked by another asyncore.loop() running in a thread.
This way, you do not mix the server connection checks and client handling. So, server is checked immediately after it accepts one connection. The other loop balances between clients.
If you are determined to go even faster, you can exploit the multiprocessor computers by having more maps for handlers.
For example, one per CPU and as many threads with asyncore.loop()s.
Note, sockets are IO operations using system calls and select() is one too, therefore GIL is released while asyncore.loop() is waiting for results. This means, that you will have total advantage of multithreading and each CPU will deal with its number of clients in literally parallel way.
What you would have to do is make the server distributing the load and starting threading loops upon connection arrivals.
Don't forget that asyncore.loop() ends when the map empties. So the loop() in a thread that manages clients must be started when new connection is accepted and restarted if at some time there are no more connections present.
4.
If you want to be able to run your server on multiple computers and use them as a cluster, then you install the process balancer in front.
I do not see the serious need for it if you wrote the asyncore server correctly and want to run it on single computer only.

Is it possible to prevent python's http.client.HTTPResponse.read() from hanging when there is no data?

I'm using Python http.client.HTTPResponse.read() to read data from a stream. That is, the server keeps the connection open forever and sends data periodically as it becomes available. There is no expected length of response. In particular, I'm getting Tweets through the Twitter Streaming API.
To accomplish this, I repeatedly call http.client.HTTPResponse.read(1) to get the response, one byte at a time. The problem is that the program will hang on that line if there is no data to read, which there isn't for large periods of time (when no Tweets are coming in).
I'm looking for a method that will get a single byte of the HTTP response, if available, but that will fail instantly if there is no data to read.
I've read that you can set a timeout when the connection is created, but setting a timeout on the connection defeats the whole purpose of leaving it open for a long time waiting for data to come in. I don't want to set a timeout, I want to read data if there is data to be read, or fail if there is not, without waiting at all.
I'd like to do this with what I have now (using http.client), but if it's absolutely necessary that I use a different library to do this, then so be it. I'm trying to write this entirely myself, so suggesting that I use someone else's already-written Twitter API for Python is not what I'm looking for.
This code gets the response, it runs in a separate thread from the main one:
while True:
try:
readByte = dc.request.read(1)
except:
readByte = []
if len(byte) != 0:
dc.responseLock.acquire()
dc.response = dc.response + chr(byte[0])
dc.responseLock.release()
Note that the request is stored in dc.request and the response in dc.response, these are created elsewhere. dc.responseLock is a Lock that prevents dc.response from being accessed by multiple threads at once.
With this running on a separate thread, the main thread can then get dc.response, which contains the entire response received so far. New data is added to dc.response as it comes in without blocking the main thread.
This works perfectly when it's running, but I run into a problem when I want it to stop. I changed my while statement to while not dc.twitterAbort, so that when I want to abort this thread I just set dc.twitterAbort to True, and the thread will stop.
But it doesn't. This thread remains for a very long time afterward, stuck on the dc.request.read(1) part. There must be some sort of timeout, because it does eventually get back to the while statement and stop the thread, but it takes around 10 seconds for that to happen.
How can I get my thread to stop immediately when I want it to, if it's stuck on the call to read()?
Again, this method is working to get Tweets, the problem is only in getting it to stop. If I'm going about this entirely the wrong way, feel free to point me in the right direction. I'm new to Python, so I may be overlooking some easier way of going about this.

Your idea is not new, there are OS mechanisms(*) for making sure that an application is only calling I/O-related system calls when they are guaranteed to be not blocking . These mechanisms are usually used by async I/O frameworks, such as tornado or gevent. Use one of those, and you will find it very easy to run code "while" your application is waiting for an I/O event, such as waiting for incoming data on a socket.
If you use gevent's monkey-patching method, you can proceed using http.client, as requested. You just need to get used to the cooperative scheduling paradigm introduced by gevent/greenlets, in which your execution flow "jumps" between sub-routines.
Of course you can also perform blocking I/O in another thread (like you did), so that it does not affect the responsiveness of your main thread. Regarding your "How can I get my thread to stop immediately" problem:
Forcing a thread that's blocking in a system call to stop is usually not a clean or even valid process (also see Is there any way to kill a Thread in Python?). Either -- if your application has finished its jobs -- you take down the entire process, which also affects all contained threads, or you just leave the thread be and give it as much time to terminate as required (these 10 seconds you were referring to are not a problem -- are they?)
If you do not want to have such long-blocking system calls anywhere in your application (be it in the main thread or not), then use above-mentioned techniques to prevent blocking system calls.
(*) see e.g. O_NONBLOCK option in http://man7.org/linux/man-pages/man2/open.2.html

Listening for events on a network and handling callbacks robostly

I am developing a small Python program for the Raspberry Pi that listens for some events on a Zigbee network.
The way I've written this is rather simplisic, I have a while(True): loop checking for a Uniquie ID (UID) from the Zigbee. If a UID is received it's sent to a dictionary containing some callback methods. So, for instance, in the dictionary the key 101 is tied to a method called PrintHello().
So if that key/UID is received method PrintHello will be executed - pretty simple, like so:
if self.expectedCallBacks.has_key(UID) == True:
self.expectedCallBacks[UID]()
I know this approach is probably too simplistic. My main concern is, what if the system is busy handling a method and the system receives another message?
On an embedded MCU I can handle easily with a circuler buffer + interrupts but I'm a bit lost with it comes to doing this with a RPi. Do I need to implement a new thread for the Zigbee module that basically fills a buffer that the call back handler can then retrieve/read from?
I would appreciate any suggestions on how to implement this more robustly.

Threads can definitely help to some degree here. Here's a simple example using a ThreadPool:
from multiprocessing.pool import ThreadPool
pool = ThreadPool(2) # Create a 2-thread pool
while True:
uid = zigbee.get_uid()
if uid in self.expectedCallbacks:
pool.apply_async(self.expectedCallbacks[UID])
That will kick off the callback in a thread in the thread pool, and should help prevent events from getting backed up before you can send them to a callback handler. The ThreadPool will internally handle queuing up any tasks that can't be run when all the threads in the pool are already doing work.
However, remember that Raspberry Pi's have only one CPU core, so you can't execute more than one CPU-based operation concurrently (and that's even ignoring the limitations of threading in Python caused by the GIL, which is normally solved by using multiple processes instead of threads). That means no matter how many threads/processes you have, only one can get access to the CPU at a time. For that reason, you probably don't want more than one thread actually running the callbacks, since as you add more you're just going to slow things down, due to the OS needing to constantly switch between threads.

Why is only 2 of 10 selecting process notified when a socket become readable?

There are 10 processes sharing a socket in my application.
They all wait for it to become readable using select.
But I notice in the application log that only 2 of these 10 processes any time the socket become readable.
What could be the reason?

I suspect what's happening is that the first process is waking up, returning from select(), and calling accept() before the subsequent context switch to the other processes can occur.
I'm not sure what select() actually blocks on or how it wakes up. I suspect when it does wake up from it's waiting on, it re-checks the queue to see if data is still available. If not, it goes back to waiting.
I'll double-down on my hypothesis as well. The fact that 2 processes are waking up is indicative of the fact that you have a dual-core processor. If you had a quad-core, you might see up to 4 processes wake up simultaneously.
One simple way to prove this theory: Put a 2 second sleep() call just prior to calling accept(). I suspect you'll see all 10 processes waking up and logging an attempt to call accept.
If your goal is to have N processes (or threads) servicing incoming connections, your approach is probably still good. You could probably switch from doing a select() call on a non-blocking socket, to just using a blocking socket that calls accept() directly. When an incoming connection comes in, one of the processes will return from accept() with a valid client socket handle. The others will still remain blocked.

How do I limit the number of active threads in python?

Am new to python and making some headway with threading - am doing some music file conversion and want to be able to utilize the multiple cores on my machine (one active conversion thread per core).
class EncodeThread(threading.Thread):
# this is hacked together a bit, but should give you an idea
def run(self):
decode = subprocess.Popen(["flac","--decode","--stdout",self.src],
stdout=subprocess.PIPE)
encode = subprocess.Popen(["lame","--quiet","-",self.dest],
stdin=decode.stdout)
encode.communicate()
# some other code puts these threads with various src/dest pairs in a list
for proc in threads: # `threads` is my list of `threading.Thread` objects
proc.start()
Everything works, all the files get encoded, bravo! ... however, all the processes spawn immediately, yet I only want to run two at a time (one for each core). As soon as one is finished, I want it to move on to the next on the list until it is finished, then continue with the program.
How do I do this?
(I've looked at the thread pool and queue functions but I can't find a simple answer.)
Edit: maybe I should add that each of my threads is using subprocess.Popen to run a separate command line decoder (flac) piped to stdout which is fed into a command line encoder (lame/mp3).

If you want to limit the number of parallel threads, use a semaphore:
threadLimiter = threading.BoundedSemaphore(maximumNumberOfThreads)
class EncodeThread(threading.Thread):
def run(self):
threadLimiter.acquire()
try:
<your code here>
finally:
threadLimiter.release()
Start all threads at once. All but maximumNumberOfThreads will wait in threadLimiter.acquire() and a waiting thread will only continue once another thread goes through threadLimiter.release().

"Each of my threads is using subprocess.Popen to run a separate command line [process]".
Why have a bunch of threads manage a bunch of processes? That's exactly what an OS does that for you. Why micro-manage what the OS already manages?
Rather than fool around with threads overseeing processes, just fork off processes. Your process table probably can't handle 2000 processes, but it can handle a few dozen (maybe a few hundred) pretty easily.
You want to have more work than your CPU's can possibly handle queued up. The real question is one of memory -- not processes or threads. If the sum of all the active data for all the processes exceeds physical memory, then data has to be swapped, and that will slow you down.
If your processes have a fairly small memory footprint, you can have lots and lots running. If your processes have a large memory footprint, you can't have very many running.

If you're using the default "cpython" version then this won't help you, because only one thread can execute at a time; look up Global Interpreter Lock. Instead, I'd suggest looking at the multiprocessing module in Python 2.6 -- it makes parallel programming a cinch. You can create a Pool object with 2*num_threads processes, and give it a bunch of tasks to do. It will execute up to 2*num_threads tasks at a time, until all are done.
At work I have recently migrated a bunch of Python XML tools (a differ, xpath grepper, and bulk xslt transformer) to use this, and have had very nice results with two processes per processor.

It looks to me that what you want is a pool of some sort, and in that pool you would like the have n threads where n == the number of processors on your system. You would then have another thread whose only job was to feed jobs into a queue which the worker threads could pick up and process as they became free (so for a dual code machine, you'd have three threads but the main thread would be doing very little).
As you are new to Python though I'll assume you don't know about the GIL and it's side-effects with regard to threading. If you read the article I linked you will soon understand why traditional multithreading solutions are not always the best in the Python world. Instead you should consider using the multiprocessing module (new in Python 2.6, in 2.5 you can use this backport) to achieve the same effect. It side-steps the issue of the GIL by using multiple processes as if they were threads within the same application. There are some restrictions about how you share data (you are working in different memory spaces) but actually this is no bad thing: they just encourage good practice such as minimising the contact points between threads (or processes in this case).
In your case you are probably intersted in using a pool as specified here.

Short answer: don't use threads.
For a working example, you can look at something I've recently tossed together at work. It's a little wrapper around ssh which runs a configurable number of Popen() subprocesses. I've posted it at: Bitbucket: classh (Cluster Admin's ssh Wrapper).
As noted, I don't use threads; I just spawn off the children, loop over them calling their .poll() methods and checking for timeouts (also configurable) and replenish the pool as I gather the results. I've played with different sleep() values and in the past I've written a version (before the subprocess module was added to Python) which used the signal module (SIGCHLD and SIGALRM) and the os.fork() and os.execve() functions --- which my on pipe and file descriptor plumbing, etc).
In my case I'm incrementally printing results as I gather them ... and remembering all of them to summarize at the end (when all the jobs have completed or been killed for exceeding the timeout).
I ran that, as posted, on a list of 25,000 internal hosts (many of which are down, retired, located internationally, not accessible to my test account etc). It completed the job in just over two hours and had no issues. (There were about 60 of them that were timeouts due to systems in degenerate/thrashing states -- proving that my timeout handling works correctly).
So I know this model works reliably. Running 100 current ssh processes with this code doesn't seem to cause any noticeable impact. (It's a moderately old FreeBSD box). I used to run the old (pre-subprocess) version with 100 concurrent processes on my old 512MB laptop without problems, too).
(BTW: I plan to clean this up and add features to it; feel free to contribute or to clone off your own branch of it; that's what Bitbucket.org is for).

I am not an expert in this, but I have read something about "Lock"s. This article might help you out
Hope this helps

I would like to add something, just as a reference for others looking to do something similar, but who might have coded things different from the OP. This question was the first one I came across when searching and the chosen answer pointed me in the right direction. Just trying to give something back.
import threading
import time
maximumNumberOfThreads = 2
threadLimiter = threading.BoundedSemaphore(maximumNumberOfThreads)
def simulateThread(a,b):
threadLimiter.acquire()
try:
#do some stuff
c = a + b
print('a + b = ',c)
time.sleep(3)
except NameError: # Or some other type of error
# in case of exception, release
print('some error')
threadLimiter.release()
finally:
# if everything completes without error, release
threadLimiter.release()
threads = []
sample = [1,2,3,4,5,6,7,8,9]
for i in range(len(sample)):
thread = threading.Thread(target=(simulateThread),args=(sample[i],2))
thread.daemon = True
threads.append(thread)
thread.start()
for thread in threads:
thread.join()
This basically follows what you will find on this site:
https://www.kite.com/python/docs/threading.BoundedSemaphore

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.