Python multiprocessing pipe recv() doc unclear or did I miss anything? - python

I have been learning how to use the Python multiprocessing module recently, and reading the official doc. In 16.6.1.2. Exchanging objects between processes there is a simple example about using pipe to exchange data.
And, in 16.6.2.4. Connection Objects, there is this statement, quoted "Raises EOFError if there is nothing left to receive and the other end was closed."
So, I revised the example as shown below. IMHO this should trigger an EOFError exception: nothing sent and the sending end is closed.
The revised code:
from multiprocessing import Process, Pipe
def f(conn):
#conn.send([42, None, 'hello'])
conn.close()
if __name__ == '__main__':
parent_conn, child_conn = Pipe()
p = Process(target=f, args=(child_conn,))
p.start()
#print parent_conn.recv() # prints "[42, None, 'hello']"
try:
print parent_conn.recv()
except EOFError:
pass
p.join()
But, when I tried the revised example on my Ubuntu 11.04 machine, Python 2.7.2, the script hang.
If anyone can point out to me what I missed, I would be very appreciative.

When you start a new process with mp.Process, the child process inherits the pipes of the parent. When the child closes conn, the parent process still has child_conn open, so the reference count for the pipe file descriptor is still greater than 0, and so EOFError is not raised.
To get the EOFError, close the end of the pipe in both the parent and child processes:
import multiprocessing as mp
def foo_pipe(conn):
conn.close()
def pipe():
conn = mp.Pipe()
parent_conn, child_conn = conn
proc = mp.Process(target = foo_pipe, args = (child_conn, ))
proc.start()
child_conn.close() # <-- Close the child_conn end in the main process too.
try:
print(parent_conn.recv())
except EOFError as err:
print('Got here')
proc.join()
if __name__=='__main__':
pipe()

Related

Evaluating data passed through multiprocess pipe

I noticed data received through a multiprocess pipe can not be evaulated directly. In the example below, the code gets stuck in the child process.
import multiprocessing as mp
def child(conn):
while True:
if conn.recv()==1:
conn.send(1)
if conn.recv()==2:
conn.send(2)
conn.close()
def main():
parent_conn, child_conn = mp.Pipe()
p = mp.Process(target=child, args=(child_conn,))
p.start()
while True:
parent_conn.send(1)
print(parent_conn.recv())
p.join()
if __name__ == '__main__':
main()
But if I assign a variable to conn.recv() in the child process, as shown below. Then everything works.
def child(conn):
while True:
x = conn.recv()
if x==1:
conn.send(1)
if x==2:
conn.send(2)
conn.close()
I assume this is because the parent and child processes are running concurrently, so the data being passed should only be evaluated as they are received. Is this the the cause?
I am running Python 3.7 on Windows 10.

Subprocess.Popen get output even in case of timeout

I am using subprocess.Popenand Popen.communicate to run a process with a timeout, similar to the suggestion given is this question:
Subprocess timeout failure
Thus I am using the following code:
with Popen(["strace","/usr/bin/wireshark"], stdout=subprocess.PIPE,stderr=subprocess.PIPE, preexec_fn=os.setsid) as process:
try:
output = process.communicate(timeout=2)[0]
print("Output try", output)
except TimeoutExpired:
os.killpg(process.pid, signal.SIGINT) # send signal to the process group
output = process.communicate()[0]
print("Output except",output)
return output
And I get the following output:
Output except b''
and b'' as a return value.
How can I get the output of the process (the output until it is killed) even though the TimeoutExpired exception is raised?
Use kill to terminate the process:
import subprocess
import os
def foo():
with subprocess.Popen(["strace", "/usr/bin/wireshark"], stdout=subprocess.PIPE,stderr=subprocess.PIPE, preexec_fn=os.setsid) as process:
try:
return process.communicate(timeout=2)[0]
except subprocess.TimeoutExpired:
process.kill()
return process.communicate()[0]
output = foo()
If this does not work for you (it does for me), then:
Instead of calling communicate, start a thread that will loop reading chunks from process.stdout and write them to a passed queue.Queue instance. The main thread will now call process.wait(timeout=2) to await the termination. If subprocess.TimeoutExpired is raised, then the process will be killed. When we know that the process is no longer running and the thread has written all the chunks it will ever write, we loop doing calls to get_nowait on the queue to get all the chunks until the queue is empty and finally we concatenate the chunks together:
import subprocess
from threading import Thread
from queue import Queue
import os
def foo():
def reader(stream, queue):
while True:
data = stream.read(4096)
if data == b'':
break
queue.put(data)
with subprocess.Popen(["strace", "/usr/bin/wireshark"], stdout=subprocess.PIPE,stderr=subprocess.PIPE, preexec_fn=os.setsid) as process:
queue = Queue()
t = Thread(target=reader, args=(process.stdout, queue))
t.start()
try:
process.wait(timeout=2)
except subprocess.TimeoutExpired:
process.kill()
# Never do a join on a process writing to a multiprocessing.Queue
# before you retrieve all items from the queue. But here we are OK:
t.join()
output = []
try:
while True:
output.append(queue.get_nowait())
except Empty:
pass
return b''.join(output)
output = foo()
Popen objects have a kill method, why not use that instead? After a process is killed, you can just read it's stdout. Does this work?
with Popen(["strace","/usr/bin/wireshark"], stdout=subprocess.PIPE,stderr=subprocess.PIPE, preexec_fn=os.setsid) as process:
try:
output = process.communicate(timeout=2)[0]
print("Output try", output)
except TimeoutExpired:
process.kill()
output = process.stdout.read().decode()
print("Output except",output)
return output
According to the document: Popen.communicate
If the process does not terminate after timeout seconds, a TimeoutExpired exception will be raised. Catching this exception and retrying communication will not lose any output.
Your code is worked perfectly for me, the problem is that your command is write log to stderr only so the output is empty. See this Determine if output is stdout or stderr

How to get id of process from which came signal in Python?

Please see following Python code:
def Handler(signum, frame):
#do something
signal.signal(signal.SIGCHLD, Handler)
Is there a way to get process ID from which signal came from?
Or is there another way to get process ID from which signal came from without blocking main flow of application?
You cannot directly. The signal module of Python standard library has no provision for giving access to the Posix sigaction_t structure. If you really need that, you will have to build a Python extension in C or C++.
You will find pointers for that in Extending and Embedding the Python Interpreter - this document should also be available in your Python distribution
os.getpid() returns the current process id. So when you send a signal, you can print it out, for example.
import signal
import os
import time
def receive_signal(signum, stack):
print 'Received:', signum
signal.signal(signal.SIGUSR1, receive_signal)
signal.signal(signal.SIGUSR2, receive_signal)
print 'My PID is:', os.getpid()
Check this for more info on signals.
To send pid to the process one may use Pipe
import os
from multiprocessing import Process, Pipe
def f(conn):
conn.send([os.getpid()])
conn.close()
if __name__ == '__main__':
parent_conn, child_conn = Pipe()
p = Process(target=f, args=(child_conn,))
p.start()
print parent_conn.recv() # prints os.getpid()
p.join()
Exchanging objects between processes

Python pipe.send() hangs on Mac OS

Following program always hangs on Mac OS (Python 2.7.5) if I return big enough string on Mac OS. I can't says for sure what is the limit, but it works for smaller text.
It works fine on Ubuntu, but hangs on pipe_to_parent.send(result).
Does anybody know how to fix this? Is there anything wrong with the code bellow?
#!/usr/bin/python
import sys
from multiprocessing import Process, Pipe
def run(text, length):
return (text * ((length / len(text))+1))[:length]
def proc_func(pipe_to_parent):
result = {'status': 1, 'log': run('Hello World', 20000), 'details': {}, 'exception': ''}
pipe_to_parent.send(result)
sys.exit()
def call_run():
to_child, to_self = Pipe()
proc = Process(target=proc_func, args=(to_self,))
proc.start()
proc.join()
print(to_child.recv())
to_child.close()
to_self.close()
call_run()
The documentation shows an example that has some differences, as follows:
from multiprocessing import Process, Pipe
def f(conn):
conn.send([42, None, 'hello'])
conn.close()
if __name__ == '__main__':
parent_conn, child_conn = Pipe()
p = Process(target=f, args=(child_conn,))
p.start()
# This is the important part
# Note: conn.recv() is called _before_ process.join()
print parent_conn.recv() # prints "[42, None, 'hello']"
p.join()
In your example, you call .recv() after you've already called process.join().
...
proc = Process(target=proc_func, args=(to_self,))
proc.start()
proc.join()
print(to_child.recv())
...
To see exactly what is happening, we would have to look at the multiprocessing module code, but I'm guessing that the hanging is occurring because the pipe is attempting to begin a read from a closed end and blocking to wait for a response.

Large objects and `multiprocessing` pipes and `send()`

I've recently found out that, if we create a pair of parent-child connection objects by using multiprocessing.Pipe, and if an object obj we're trying to send through the pipe is too large, my program hangs without throwing exception or doing anything at all. See code below. (The code below uses the numpy package to produce a large array of floats.)
import multiprocessing as mp
import numpy as np
def big_array(conn, size=1200):
a = np.random.rand(size)
print "Child process trying to send array of %d floats." %size
conn.send(a)
return a
if __name__ == "__main__":
print "Main process started."
parent_conn, child_conn = mp.Pipe()
proc = mp.Process(target=big_array, args=[child_conn, 1200])
proc.start()
print "Child process started."
proc.join()
print "Child process joined."
a = parent_conn.recv()
print "Received the following object."
print "Type: %s. Size: %d." %(type(a), len(a))
The output is the following.
Main process started.
Child process started.
Child process trying to send array of 1200 floats.
And it hangs here indefinitely. However, if instead of 1200, we try to send an array with 1000 floats, then the program executes successfully, with the following output as expected.
Main process started.
Child process started.
Child process trying to send array of 1000 floats.
Child process joined.
Received the following object.
Type: <type 'numpy.ndarray'>. Size: 1000.
Press any key to continue . . .
This looks like a bug to me. The documentation says the following.
send(obj)
Send an object to the other end of the connection which should be read using recv().
The object must be picklable. Very large pickles (approximately 32 MB+, though it depends on the OS) may raise a ValueError exception.
But with my run, not even a ValueError exception was thrown, the program just hangs there. Moreover, the 1200-long numpy array is 9600 bytes big, certainly not more than 32MB! This looks like a bug. Does anyone know how to solve this problem?
By the way, I'm using Windows 7, 64-bit.
Try to move join() below recv():
import multiprocessing as mp
def big_array(conn, size=1200):
a = "a" * size
print "Child process trying to send array of %d floats." %size
conn.send(a)
return a
if __name__ == "__main__":
print "Main process started."
parent_conn, child_conn = mp.Pipe()
proc = mp.Process(target=big_array, args=[child_conn, 120000])
proc.start()
print "Child process started."
print "Child process joined."
a = parent_conn.recv()
proc.join()
print "Received the following object."
print "Type: %s. Size: %d." %(type(a), len(a))
But I don't really understand why your example works even for small sizes. I was thinking that writing to pipe and then making the process to join without first reading the data from pipe will block the join. You should first receive from pipe, then join. But apparently it does not block for small sizes...?
Edit: from the docs (http://docs.python.org/3/library/multiprocessing.html#multiprocessing-programming):
"An example which will deadlock is the following:"
from multiprocessing import Process, Queue
def f(q):
q.put('X' * 1000000)
if __name__ == '__main__':
queue = Queue()
p = Process(target=f, args=(queue,))
p.start()
p.join() # this deadlocks
obj = queue.get()

Categories

Resources