What could make a connection.send() block ? (from conn1, conn2 = multiprocessing.Pipe() ) - python

I am debugging application that gather information from 2 sensors : a webcam and a microphone.
The general architecture is quite simple :
the main process sends messages (start, stop, get_data) via pipes to the child processes (one for each).
child processes gather the data and send it to the main process
Child & main processes are in infinite loops to process commands (the main process from the user, the child process from the main process).
It globally works but I have trouble stopping the child processes.
I have logged the code and it seems to happen 2 things :
The 'stop' message is sent but doesn't get through the pipe.
The child process continue to send data and the conn.send(data) blocks.
The behavior is clearly linked to the state of the connection, as child processes that send nothing back don't have this behavior. Still, I don't see how to debug/modify the current architecture which seems reasonnable.
So, what cause this blocking behavior and how to avoid it ?
This is the code which is executed for each iteration of the infinite loop in the child process :
def do(self):
while self.cnx.poll():
msg = self.cnx.recv()
self.queue.append(msg)
#==
if not self.queue:
func_name = 'default_action'
self.queue.append([func_name, ])
#==
msg = self.queue.pop()
func_name, args = msg[0], msg[1:]
#==
res = self.target.__getattribute__(func_name)(*args)
#==
running = func_name != 'stop'
#==
if res and self.send:
assert running
self.output_queue.append(res[0])
if self.output_queue and running:
self.cnx.send(self.output_queue.popleft())
#==
return running
update : it seems that the Pipe cannot be written simultaneously on both end. It works if change the last few lines of the above code to :
if self.output_queue and running:
if not self.cnx.poll():
self.cnx.send(self.output_queue.popleft())
The question stays open though as Pipe are documented as full duplex by default and this behavior is not documented at all. I must have misunderstood something. Please, enlight me!
update 2 : just to be clear, no connection is closed during in this situation. To describe the sequence of events :
the main process sends a messsage ("stop") (it empties the connection before sending the message)
the main process enter an (infinite) loop that stops when the child process is terminated.
meanwhile, the child process is blocked in the send and never gets the message.

A full duplex multiprocessing.Pipe is implemented as socketpair(). Calling .send can block for all the normal reasons when talking to a socket. Based on your description I think it's likely that the reader of your Pipe has quit reading and data has built up in the buffers in the kernel to the point where your .send blocks.
If you explicitly .close the receiving side you'll probably get some kind of error (although possibly SIGPIPE as well, not sure) when you try to .send. If your receiving connection was going out of scope this would probably happen automatically. You may be able to fix the problem by just being more careful not to store references (direct or indirect) to the receiving side so it gets deallocated when that thread goes away.
Trivial demo of blocking .send:
import multiprocessing
a, b = multiprocessing.Pipe()
while True:
print "send!"
a.send("hello world")
Now note that after a while it quits printing "send!"

Related

Python Multiprocessing processes not closing with join()

I am running a relatively large Python program which involves a handful of processes running at the same time, which is achieved using the multiprocessing library. I am experiencing an issue where 1 of my processes will not quit, so when I try to exit the program (with CTRL+C) it just hangs forever (The only way to close it is to force close the Python.exe process from task manager). For every other process that I have, when I call process.join(timeout=1), it closes the process. However, for this one specific process it just never closes (I was only able to identify this after putting a print statement after every .join() and seeing that there is only 1 process that never reaches the print statement).
Does anyone know why this might be happening, and how I can get this process to close? I saw somewhere else that this might be due to the process having a non-empty Queue, but this specific process only has 1 mp.Queue that I am clearing right before I close it:
class MyClass():
def __init__(self):
self.queue = mp.Queue()
self.bad_process = mp.Process(target=some_func)
self.bad_process.start()
...
def close(self):
# Clear queue before closing
while not self.queue.empty():
self.queue.get()
print("This line prints")
self.bad_process.join(timeout=1)
print("Never reaches here, always hanging")

Function within Worker/Child instance does not return, freezes program

I am using the multiprocessing module in python. Here is a sample of the code I am using:
import multiprocessing as mp
def function(fun_var1, fun_var2):
b = fun_var1 + fun_var2
# and more computationally intensive stuff happens here
return b
# my program freezes after the return command
class Worker(mp.Process):
def __init__(self, queue_obj, func_var1, func_var2):
mp.Process.__init__(self)
self.queue_obj = queue_obj
self.func_var1 = func_var1
self.func_var2 = func_var2
def run(self):
self.var = function( self.func_var1, self.func_var2 )
self.queue_obj.put(self.var)
if __name__ == '__main__':
mp.freeze_support()
queue_list = []
processes = []
result = []
for i in range(2):
queue_list.append(mp.Queue())
processes.append( Worker(queue_list[i], i, var1, var2 )
processes[i].start()
for i in range(2):
processes[i].join()
result.append(queue_list[i].get())
During runtime of the program two instances of the worker class are generated which work simultaneously. One instance finishes after about 2 minutes and the other would take about 7 minutes. The first instance returns its results fine. However, the second instance freezes the program when the function() that is called within the run() method returns its value. No error is being thrown, the program just does not continue to execute. The console also indicates that it is busy but not displaying the >>> prompt. I am completely clueless why this behavior occurs. The same code works fine for slightly different inputs in the two Worker instances. The only difference I can make out is that the work loads are more equal when it executes correctly. Could the time difference cause trouble? Does anyone have experience with this kind of behavior? Also note that if I run a serial setup of the program in which function() is just called twice by the main program, the code executes flawlessly. Could there be some timeout involved in the worker instance that makes it impossible for function() to return its value to the Worker instance? The return value of function() is actually a list that is fairly small. It contains about 100 float values.
Any suggestions are welcomed!
This is a bit of an educated guess without actually seeing what's going on in worker, but is it possible that your child has put items into the Queue that haven't been consumed? The documentation has a warning about this:
Warning
As mentioned above, if a child process has put items on a queue (and
it has not used JoinableQueue.cancel_join_thread), then that process
will not terminate until all buffered items have been flushed to the
pipe.
This means that if you try joining that process you may get a deadlock
unless you are sure that all items which have been put on the queue
have been consumed. Similarly, if the child process is non-daemonic
then the parent process may hang on exit when it tries to join all its
non-daemonic children.
Note that a queue created using a manager does not have this issue.
See Programming guidelines.
It might be worth trying to create your Queue object using mp.Manager.Queue and see if the issue goes away.

How do I cleanly exit from a multiprocessing script?

I am building a non-blocking chat application for my website, and I decided to implement some multiprocessing to deal with DB querying and real-time messaging.
I assume that when a user lands on a given URL to see their conversation with the other person, I will fire off the script, the multiprocessing will begin, the messages will be added to a queue and displayed on the page, new messages will be sent to a separate queue that interacts with the DB, etc. (Regular message features ensue.)
However, what happens when the user leaves this page? I assume I need to exit these various processes, but currently, this does not lend itself to a "clean" exit. I would have to terminate processes and according to the multiprocessing docs:
Warning: If this method (terminate()) is used when the associated process is using a pipe
or queue then the pipe or queue is liable to become corrupted and may become
unusable by other process.Similarly, if the process has acquired a lock or
semaphore etc. then terminating it is liable to cause other processes to
deadlock.
I have also looked into sys.exit(); however, it doesn't fully exit the script without the use of terminate() on the various processes.
Here is my code that is simplified for the purposes of this question. If I need to change it, that's completely fine. I simply want to make sure I am going about this appropriately.
import multiprocessing
import Queue
import time
import sys
## Get all past messages
def batch_messages():
# The messages list here will be attained via a db query
messages = [">> This is the message.", ">> Hello, how are you doing today?", ">> Really good!"]
for m in messages:
print m
## Add messages to the DB
def add_messages(q2):
while True:
# Retrieve from the queue
message_to_db = q2.get()
# For testing purposes only; perfrom another DB query to add the message to the DB
print message_to_db, "(Add to DB)"
## Recieve new, inputted messages.
def receive_new_message(q1, q2):
while True:
# Add the new message to the queue:
new_message = q1.get()
# Print the message to the (other user's) screen
print ">>", new_message
# Add the q1 message to q2 for databse manipulation
q2.put(new_message)
def shutdown():
print "Shutdown initiated"
p_rec.terminate()
p_batch.terminate()
p_add.terminate()
sys.exit()
if __name__ == "__main__":
# Set up the queue
q1 = multiprocessing.Queue()
q2 = multiprocessing.Queue()
# Set up the processes
p_batch = multiprocessing.Process(target=batch_messages)
p_add = multiprocessing.Process(target=add_messages, args=(q2,))
p_rec = multiprocessing.Process(target=receive_new_message, args=(q1, q2,))
# Start the processes
p_batch.start() # Perfrom batch get
p_rec.start()
p_add.start()
time.sleep(0.1) # Test: Sleep to allow proper formatting
while True:
# Enter a new message
input_message = raw_input("Type a message: ")
# TEST PURPOSES ONLY: shutdown
if input_message == "shutdown_now":
shutdown()
# Add the new message to the queue:
q1.put(input_message)
# Let the processes catch up before printing "Type a message: " again. (Shell purposes only)
time.sleep(0.1)
How should I deal with this situation? Does my code need to be fundamentally revised?, and if so, what should I do to fix it?
Any thoughts, comments, revisions, or resources appreciated.
Thank you!
Disclaimer: I don't actually know python. But multithreadding concepts are similar enough in all the langauges I do know that I feel confident enough to try to answer anyway.
When using multiple threads/proccesses each one should have a step in it's loop to check a variable, (I often call the variable "active", or "keepGoing" or something and it's usually a boolean.)
That variable is usually either shared between the threads, or sent as a message to each thread depending on your programming language and when you want the proccessing to stop, (finish your work first y/n?)
Once the variable is set all threads quit their proccessing loops and proceed to exit their threads.
In your case you have a loop "while true". This never exits. Change it to exit when a variable is set and the thread should close itself when the function exit is reached.

Asynchronous KeyboardInterrupt and multithreading

It seems that asynchronous signals in multithreaded programs are not correctly handled by Python. But, I thought I would check here to see if anyone can spot a place where I am violating some principle, or misunderstanding some concept.
There are similar threads I've found here on SO, but none that seem to be quite the same.
The scenario is: I have two threads, reader thread and writer thread (main thread). The writer thread writes to a pipe that the reader thread polls. The two threads are coordinated using a threading.Event() primitive (which I assume is implemented using pthread_cond_wait). The main thread waits on the Event while the reader thread eventually sets it.
But, if I want to interrupt my program while the main thread is waiting on the Event, the KeyboardInterrupt is not handled asynchronously.
Here is a small program to illustrate my point:
#!/usr/bin/python
import os
import sys
import select
import time
import threading
pfd_r = -1
pfd_w = -1
reader_ready = threading.Event()
class Reader(threading.Thread):
"""Read data from pipe and echo to stdout."""
def run(self):
global pfd_r
while True:
if select.select([pfd_r], [], [], 1)[0] == [pfd_r]:
output = os.read(pfd_r, 1000)
sys.stdout.write("R> '%s'\n" % output)
sys.stdout.flush()
# Suppose there is some long-running processing happening:
time.sleep(10)
reader_ready.set()
# Set up pipe.
(pfd_r, pfd_w) = os.pipe()
rt = Reader()
rt.daemon = True
rt.start()
while True:
reader_ready.clear()
user_input = raw_input("> ").strip()
written = os.write(pfd_w, user_input)
assert written == len(user_input)
# Wait for reply -- Try to ^C here and it won't work immediately.
reader_ready.wait()
Start the program with './bug.py' and enter some input at the prompt. Once you see the reader reply with the prefix 'R>', try to interrupt using ^C.
What I see (Ubuntu Linux 10.10, Python 2.6.6) is that the ^C is not handled until after the blocking reader_ready.wait() returns. What I expected to see is that the ^C is raised asynchronously, resulting in the program terminating (because I do not catch KeyboardInterrupt).
This may seem like a contrived example, but I'm running into this in a real-world program where the time.sleep(10) is replaced by actual computation.
Am I doing something obviously wrong, like misunderstanding what the expected result would be?
Edit: I've also just tested with Python 3.1.1 and the same problem exists.
The wait() method of a threading._Event object actually relies on a thread.lock's acquire() method. However, the thread documentation states that a lock's acquire() method cannot be interrupted, and that any KeyboardInterrupt exception will be handled after the lock is released.
So basically, this is working as intended. Threading objects that implement this behavior rely on a lock at some point (including queues), so you might want to choose another path.
Alternatively, you could also use the pause() function of the signal module instead of reader_ready.wait(). signal.pause() is a blocking function and gets unblocked when a signal is received by the process. In your case, when ^C is pressed, SIGINT signal unblocks the function.
According to the documentation, the function is not available for Windows. I've tested it on Linux and it works. I think this is better than using wait() with a timeout.

Proper way of cancelling accept and closing a Python processing/multiprocessing Listener connection

(I'm using the pyprocessing module in this example, but replacing processing with multiprocessing should probably work if you run python 2.6 or use the multiprocessing backport)
I currently have a program that listens to a unix socket (using a processing.connection.Listener), accept connections and spawns a thread handling the request. At a certain point I want to quit the process gracefully, but since the accept()-call is blocking and I see no way of cancelling it in a nice way. I have one way that works here (OS X) at least, setting a signal handler and signalling the process from another thread like so:
import processing
from processing.connection import Listener
import threading
import time
import os
import signal
import socket
import errno
# This is actually called by the connection handler.
def closeme():
time.sleep(1)
print 'Closing socket...'
listener.close()
os.kill(processing.currentProcess().getPid(), signal.SIGPIPE)
oldsig = signal.signal(signal.SIGPIPE, lambda s, f: None)
listener = Listener('/tmp/asdf', 'AF_UNIX')
# This is a thread that handles one already accepted connection, left out for brevity
threading.Thread(target=closeme).start()
print 'Accepting...'
try:
listener.accept()
except socket.error, e:
if e.args[0] != errno.EINTR:
raise
# Cleanup here...
print 'Done...'
The only other way I've thought about is reaching deep into the connection (listener._listener._socket) and setting the non-blocking option...but that probably has some side effects and is generally really scary.
Does anyone have a more elegant (and perhaps even correct!) way of accomplishing this? It needs to be portable to OS X, Linux and BSD, but Windows portability etc is not necessary.
Clarification:
Thanks all! As usual, ambiguities in my original question are revealed :)
I need to perform cleanup after I have cancelled the listening, and I don't always want to actually exit that process.
I need to be able to access this process from other processes not spawned from the same parent, which makes Queues unwieldy
The reasons for threads are that:
They access a shared state. Actually more or less a common in-memory database, so I suppose it could be done differently.
I must be able to have several connections accepted at the same time, but the actual threads are blocking for something most of the time. Each accepted connection spawns a new thread; this in order to not block all clients on I/O ops.
Regarding threads vs. processes, I use threads for making my blocking ops non-blocking and processes to enable multiprocessing.
Isnt that what select is for??
Only call accept on the socket if the select indicates it will not block...
The select has a timeout, so you can break out occasionally occasionally to check
if its time to shut down....
I thought I could avoid it, but it seems I have to do something like this:
from processing import connection
connection.Listener.fileno = lambda self: self._listener._socket.fileno()
import select
l = connection.Listener('/tmp/x', 'AF_UNIX')
r, w, e = select.select((l, ), (), ())
if l in r:
print "Accepting..."
c = l.accept()
# ...
I am aware that this breaks the law of demeter and introduces some evil monkey-patching, but it seems this would be the most easy-to-port way of accomplishing this. If anyone has a more elegant solution I would be happy to hear it :)
I'm new to the multiprocessing module, but it seems to me that mixing the processing module and the threading module is counter-intuitive, aren't they targetted at solving the same problem?
Anyway, how about wrapping your listen functions into a process itself? I'm not clear how this affects the rest of your code, but this may be a cleaner alternative.
from multiprocessing import Process
from multiprocessing.connection import Listener
class ListenForConn(Process):
def run(self):
listener = Listener('/tmp/asdf', 'AF_UNIX')
listener.accept()
# do your other handling here
listen_process = ListenForConn()
listen_process.start()
print listen_process.is_alive()
listen_process.terminate()
listen_process.join()
print listen_process.is_alive()
print 'No more listen process.'
Probably not ideal, but you can release the block by sending the socket some data from the signal handler or the thread that is terminating the process.
EDIT: Another way to implement this might be to use the Connection Queues, since they seem to support timeouts (apologies, I misread your code in my first read).
I ran into the same issue. I solved it by sending a "stop" command to the listener. In the listener's main thread (the one that processes the incoming messages), every time a new message is received, I just check to see if it's a "stop" command and exit out of the main thread.
Here's the code I'm using:
def start(self):
"""
Start listening
"""
# set the command being executed
self.command = self.COMMAND_RUN
# startup the 'listener_main' method as a daemon thread
self.listener = Listener(address=self.address, authkey=self.authkey)
self._thread = threading.Thread(target=self.listener_main, daemon=True)
self._thread.start()
def listener_main(self):
"""
The main application loop
"""
while self.command == self.COMMAND_RUN:
# block until a client connection is recieved
with self.listener.accept() as conn:
# receive the subscription request from the client
message = conn.recv()
# if it's a shut down command, return to stop this thread
if isinstance(message, str) and message == self.COMMAND_STOP:
return
# process the message
def stop(self):
"""
Stops the listening thread
"""
self.command = self.COMMAND_STOP
client = Client(self.address, authkey=self.authkey)
client.send(self.COMMAND_STOP)
client.close()
self._thread.join()
I'm using an authentication key to prevent would be hackers from shutting down my service by sending a stop command from an arbitrary client.
Mine isn't a perfect solution. It seems a better solution might be to revise the code in multiprocessing.connection.Listener, and add a stop() method. But, that would require sending it through the process for approval by the Python team.

Categories

Resources