Output reason for Python crash - python

I have an application which polls a bunch of servers every few minutes. To do this, it spawns one thread per server to poll (15 servers) and writes back the data to an object:
import requests
class ServerResults(object):
def __init__(self):
self.results = []
def add_server(some_argument):
self.results.append(some_argument)
servers = ['1.1.1.1', '1.1.1.2']
results = ServerResults()
for s in servers:
t = CallThreads(poll_server, s, results)
t.daemon = True
t.start()
def poll_server(server, results):
response = requests.get(server, timeout=10)
results.add_server(response.status_code);
The CallThreads class is a helper function to call a function (in this case poll_server() with arguments (in this case s and results), you can see the source at my Github repo of Python utility functions. Most of the time this works fine, however sometimes a thread intermittently hangs. I'm not sure why, since I am using a timeout on the GET request. In any case, if the thread hangs then the hung threads build up over the course of hours or days, and then Python crashes:
File "/usr/lib/python2.7/threading.py", line 495, in start
_start_new_thread(self.__bootstrap, ())
thread.error: can't start new thread
Exception in thread Thread-575 (most likely raised during interpreter shutdown)
Exception in thread Thread-1671 (most likely raised during interpreter shutdown)
Exception in thread Thread-831 (most likely raised during interpreter shutdown)
How might I deal with this? There seems to be no way to kill a blocking thread in Python. This application needs to run on a Raspberry Pi, so large libraries such as twisted won't fit, in fact I need to get rid of the requests library as well!

As far as I can tell, a possible scenario is that when a thread "hangs" for one given server, it will stay there "forever". Next time you query your servers another thread is spawned (_start_new_thread), up to the point where Python crashes.
Probably not your (main) problem, but you should:
use a thread pool - this won't stress the limited resources of your your system as much as spawning new threads again and again.
check that you use a "thread-compatible" mechanism to handle concurrent access to results. Maybe a semaphore or mutex to lock atomic portions of your code. Probably better would be a dedicated data structure such as a queue.
Concerning the "hang" per se -- beware that the timeout argument while "opening a URL" (urlopen) is related to the time-out for establishing the connection. Not for downloading the actual data:
The optional timeout parameter specifies a timeout in seconds for
blocking operations like the connection attempt (if not specified, the
global default timeout setting will be used). This actually only works
for HTTP, HTTPS and FTP connections.

Related

Multithreading: How to avoid hanging caused by worker thread erroring out

I created a script that executes multiples threads, where each thread makes a request to an API to retrieve some data. Unfortunately, one of the threads might run into a disconnection error (perhaps due to overloading the site's API), and as a result, the entire python script hangs indefinitely...How can I force the script to exit gracefully when one of the worker threads has a disconnection error? I thought using terminate would close the thread.
My code:
runId = sys.argv[1]
trth = TrThDownload(runId)
data = trth.data
concurrences = min(len(data),10)
p = pool.ThreadPool(concurrences)
p.map(trth.runDownloader, data)
p.terminate()
p.close()
p.join()
You really should try async programming. I prefer gevent. At the top of your script just do this:
import gevent
gevent.monkey.patch_all()
Also, don't terminate or close before your join. Just use join.

Listening for events on a network and handling callbacks robostly

I am developing a small Python program for the Raspberry Pi that listens for some events on a Zigbee network.
The way I've written this is rather simplisic, I have a while(True): loop checking for a Uniquie ID (UID) from the Zigbee. If a UID is received it's sent to a dictionary containing some callback methods. So, for instance, in the dictionary the key 101 is tied to a method called PrintHello().
So if that key/UID is received method PrintHello will be executed - pretty simple, like so:
if self.expectedCallBacks.has_key(UID) == True:
self.expectedCallBacks[UID]()
I know this approach is probably too simplistic. My main concern is, what if the system is busy handling a method and the system receives another message?
On an embedded MCU I can handle easily with a circuler buffer + interrupts but I'm a bit lost with it comes to doing this with a RPi. Do I need to implement a new thread for the Zigbee module that basically fills a buffer that the call back handler can then retrieve/read from?
I would appreciate any suggestions on how to implement this more robustly.
Threads can definitely help to some degree here. Here's a simple example using a ThreadPool:
from multiprocessing.pool import ThreadPool
pool = ThreadPool(2) # Create a 2-thread pool
while True:
uid = zigbee.get_uid()
if uid in self.expectedCallbacks:
pool.apply_async(self.expectedCallbacks[UID])
That will kick off the callback in a thread in the thread pool, and should help prevent events from getting backed up before you can send them to a callback handler. The ThreadPool will internally handle queuing up any tasks that can't be run when all the threads in the pool are already doing work.
However, remember that Raspberry Pi's have only one CPU core, so you can't execute more than one CPU-based operation concurrently (and that's even ignoring the limitations of threading in Python caused by the GIL, which is normally solved by using multiple processes instead of threads). That means no matter how many threads/processes you have, only one can get access to the CPU at a time. For that reason, you probably don't want more than one thread actually running the callbacks, since as you add more you're just going to slow things down, due to the OS needing to constantly switch between threads.

EOFError with multiprocessing Manager

I have a bunch of clients connecting to a server via 0MQ. I have a Manager queue used for a pool of workers to communicate back to the main process on each client machine.
On just one client machine having 250 worker processes, I see a bunch of EOFError's almost instantly. They occur at the point that the put() is being performed.
I would expect that a lot of communication might slow everything down, but that I should never see EOFError's in internal multiprocessing logic. I'm not using gevent or anything that might break standard socket functionality.
Any thoughts on what could make puts to a Manager queue start raising EOFError's?
For me the error was actually that my receiving process had thrown an exception and terminated, and so the sending process was receiving an EOFError, meaning that the interprocess communication pipeline had closed.

multiprocessing and sockets. How to wait?

I have a cluster with 4 nodes and a master server. The master dispatches jobs that may take from 30 seconds to 15 minutes to end.
The nodes are listening with a SocketServer.TCPServer and in the master, I open a connection and wait for the job to end.
def run(nodes, args):
pool = multiprocessing.Pool(len(nodes))
return pool.map(load_job, zip(nodes, args))
the load_job function sends the data with socket.sendall and right after that, it uses socket.recv (The data takes a long time to arrive).
The program runs fine until about 200 or 300 of theses jobs run. When it breaks, the socket.recv receives an empty string and cannot run any more jobs until I kill the node processes and run them again.
How should I wait for the data to come? Also, error handling in pool is very poor because it saves the error from another process and show without the proper traceback and this error is not so common to repeat...
EDIT:
Now I think this problem has nothing to do with sockets:
After some research, looks like my nodes are opening way to many processes (because they also run their jobs in a multiprocessing.Pool) and somehow they are not being closed!
I found these SO question (here and here) talking about zombie processes when using multiprocessing in a daemonized process (exactly my case!).
I'll need to further understand the problem, but for now I'm killing the nodes and restoring them after some time.
(I'm replying to the question before the edit, because I don't understand exactly what you meant in it).
socket.recv is not the best way to wait for data on a socket. The best way I know is to use the select module (documentation here). The simplest use when waiting for data on a single socket would be select.select([your_socket],[],[]), but it can certainly be used for more complex tasks as well.
Regarding the issue of socket.recv receives an empty string; When the socket is a TCP socket (as it is in your case), this means the socket has been closed by the peer.
Reasons for this may vary, but the important thing to understand is that after this happens, you will no longer receive any data from this socket, so the best thing you can do with it is close it (socket.close). If you don't expect it to close, this is where you should search for the problem.
Good luck!

Gracefully Terminating Python Threads

I am trying to write a unix client program that is listening to a socket, stdin, and reading from file descriptors. I assign each of these tasks to an individual thread and have them successfully communicating with the "main" application using synchronized queues and a semaphore. The problem is that when I want to shutdown these child threads they are all blocking on input. Also, the threads cannot register signal handlers in the threads because in Python only the main thread of execution is allowed to do so.
Any suggestions?
There is no good way to work around this, especially when the thread is blocking.
I had a similar issue ( Python: How to terminate a blocking thread) and the only way I was able to stop my threads was to close the underlying connection. Which resulted in the thread that was blocking to raise and exception and then allowed me to check the stop flag and close.
Example code:
class Example(object):
def __init__(self):
self.stop = threading.Event()
self.connection = Connection()
self.mythread = Thread(target=self.dowork)
self.mythread.start()
def dowork(self):
while(not self.stop.is_set()):
try:
blockingcall()
except CommunicationException:
pass
def terminate():
self.stop.set()
self.connection.close()
self.mythread.join()
Another thing to note is commonly blocking operations generally offer up a timeout. If you have that option I would consider using it. My last comment is that you could always set the thread to deamonic,
From the pydoc :
A thread can be flagged as a “daemon thread”. The significance of this flag is that the entire Python program exits when only daemon threads are left. The initial value is inherited from the creating thread. The flag can be set through the daemon property.
Also, the threads cannot register signal handlers
Signals to kill threads is potentially horrible, especially in C, especially if you allocate memory as part of the thread, since it won't be freed when that particular thread dies (as it belongs to the heap of the process). There is no garbage collection in C, so if that pointer goes out of scope, it's gone out of scope, the memory remains allocated. So just be careful with that one - only do it that way in C if you're going to actually kill all the threads and end the process so that the memory is handed back to the OS - adding and removing threads from a threadpool for example will give you a memory leak.
The problem is that when I want to shutdown these child threads they are all blocking on input.
Funnily enough I've been fighting with the same thing recently. The solution is literally don't make blocking calls without a timeout. So, for example, what you want ideally is:
def threadfunc(running):
while running:
blockingcall(timeout=1)
where running is passed from the controlling thread - I've never used threading but I have used multiprocessing and with this you actually need to pass an Event() object and check is_set(). But you asked for design patterns, that's the basic idea.
Then, when you want this thread to end, you run:
running.clear()
mythread.join()
and your main thread should then allow your client thread to handle its last call, and return, and the whole program folds up nicely.
What do you do if you have a blocking call without a timeout? Use the asynchronous option, and sleep (as in call whatever method you have to suspend the thread for a period of time so you're not spinning) if you need to. There's no other way around it.
See these answers:
Python SocketServer
How to exit a multithreaded program?
Basically, don't block on recv() by using select() with a timeout to check for readability of the socket, and poll a quit flag when select() times out.

Categories

Resources