Google colab: printing in child processes - python

I can't see any output in a google colab when I use python Process. I tried print function and logging module but it doesn't work.
This simple example produces output on my machine (jupyter notebook, python 3.6.9) but doesn't work in colab:
from multiprocessing import Process
import time
def simple_fun(proc_id):
while True:
time.sleep(1)
print(proc_id)
N_PROCESS = 2
processes = []
for i in range(N_PROCESS):
p = Process(target=simple_fun, args=(i,))
p.start()
processes.append(p)
Is there anything I can do?
Am I missing something? May be the code above is platform-depended?

This process example works on google colab (Python 3.6.9)
from multiprocessing import Process
import os
def info(title):
print(title)
print('module name:', __name__)
if hasattr(os, 'getppid'): # only available on Unix
print( 'parent process:', os.getppid())
print( 'process id:', os.getpid())
def f(name):
info('function f')
print('hello', name)
if __name__ == '__main__':
info('main line')
p = Process(target=f, args=('bob',))
p.start()
p.join()

I found that it doesn't work only when I doesn't wait for a process with join function.
I can't block all processes at once.
But if process work takes approximately the same amount of time for all processes than I can block only on one process:
from multiprocessing import Process
import time
import logging
def simple_fun(proc_id):
while True:
time.sleep(1)
#logging.info(proc_id)
print(proc_id)
N_PROCESS = 2
processes = []
for i in range(N_PROCESS):
p = Process(target=simple_fun, args=(i,))
p.start()
processes.append(p)
processes[0].join() # <- wait for one of processes
This one works.

Related

Python multiprocessing not working when time.sleep is used

I have one function which will take different inputs and I want that function to run parallely. Below is what I tried , but it's not working due to time.sleep I think.
from multiprocessing import Process
from time import sleep
import time
def f(name):
print('hello', name)
time.sleep(10)
l1 = Queue()
a = Process(target=f('Tom'))
a.start()
l2 = Queue()
b = Process(target=f("Stock"))
b.start()
print (l1.get())
print (l2.get())
I want the function to run parallely. Currently the function waits for 10 seconds before it goes to the second execution.
See the comment posted by #juanpa.arrivillaga. Just specify as the target argument of the Process constructor the name of the function; do not call the function. The arguments to your target are specified separately as the args argument, either as a tuple or a list.
Also, since f is not returning a useful value, there is no reason to have Queue instances for the results. In any case, you are not passing the Queue instances to f and nothing is being put to the queue so I can't understand why you would be attempting to issue calls to get against these Queue instances; these calls will hang forever. The argument you do need to be passing to f is the name, as follows:
from multiprocessing import Process
#from time import sleep
import time
def f(name):
print('hello', name)
time.sleep(10)
start_time = time.time()
a = Process(target=f, args=('Tom',))
a.start()
b = Process(target=f, args=("Stock",))
b.start()
# Wait for processes to complete
a.join()
b.join()
elapsed_time = time.time() - start_time
print(elapsed_time)
Prints:
hello Tom
hello Stock
10.212252855300903
If you were running this on a platform such as Windows, then you would need to place the process-creating code in a special block as follows;
from multiprocessing import Process
#from time import sleep
import time
def f(name):
print('hello', name)
time.sleep(10)
# Required on platforms that create processes with the "spawn" method:
if __name__ == '__main__':
start_time = time.time()
a = Process(target=f, args=('Tom',))
a.start()
b = Process(target=f, args=("Stock",))
b.start()
# Wait for processes to complete
a.join()
b.join()
elapsed_time = time.time() - start_time
print(elapsed_time)
Use Pool and starmap:
import multiprocessing
import time
def f(name, place):
print('hello', name, place)
time.sleep(5)
if __name__ == '__main__':
data = [('Louis', 'val1'), ('Paul', 'val2'), ('Alexandre', 'val3'),
('John', 'val4'), ('Tom', 'val5'), ('Bob', 'val6')]
with multiprocessing.Pool(2) as pool:
pool.starmap(f, data)
Read carefully the multiprocessing guidlines
Safe importing of main module
Make sure that the main module can be safely imported by a new Python interpreter without causing unintended side effects (such a starting a new process).

How to run 10 python programs simultaneously?

I have a_1.py~a_10.py
I want to run 10 python programs in parallel.
I tried:
from multiprocessing import Process
import os
def info(title):
I want to execute python program
def f(name):
for i in range(1, 11):
subprocess.Popen(['python3', f'a_{i}.py'])
if __name__ == '__main__':
info('main line')
p = Process(target=f)
p.start()
p.join()
but it doesn't work
How do I solve this?
I would suggest using the subprocess module instead of multiprocessing:
import os
import subprocess
import sys
MAX_SUB_PROCESSES = 10
def info(title):
print(title, flush=True)
if __name__ == '__main__':
info('main line')
# Create a list of subprocesses.
processes = []
for i in range(1, MAX_SUB_PROCESSES+1):
pgm_path = f'a_{i}.py' # Path to Python program.
command = f'"{sys.executable}" "{pgm_path}" "{os.path.basename(pgm_path)}"'
process = subprocess.Popen(command, bufsize=0)
processes.append(process)
# Wait for all of them to finish.
for process in processes:
process.wait()
print('Done')
If you just need to call 10 external py scripts (a_1.py ~ a_10.py) as a separate processes - use subprocess.Popen class:
import subprocess, sys
for i in range(1, 11):
subprocess.Popen(['python3', f'a_{i}.py'])
# sys.exit() # optional
It's worth to look at a rich subprocess.Popen signature (you may find some useful params/options)
You can use a multiprocessing pool to run them concurrently.
import multiprocessing as mp
def worker(module_name):
""" Executes a module externally with python """
__import__(module_name)
return
if __name__ == "__main__":
max_processes = 5
module_names = [f"a_{i}" for i in range(1, 11)]
print(module_names)
with mp.Pool(max_processes) as pool:
pool.map(worker, module_names)
The max_processes variable is the maximum number of workers to have working at any given time. In other words, its the number of processes spawned by your program. The pool.map(worker, module_names) uses the available processes and calls worker on each item in your module_names list. We don't include the .py because we're running the module by importing it.
Note: This might not work if the code you want to run in your modules is contained inside if __name__ == "__main__" blocks. If that is the case, then my recommendation would be to move all the code in the if __name__ == "__main__" blocks of the a_{} modules into a main function. Additionally, you would have to change the worker to something like:
def worker(module_name):
module = __import__(module_name) # Kind of like 'import module_name as module'
module.main()
return

Python multiprocessing - problem with empty queue and pool freezing

I have problems with python multiprocessing
python version 3.6.6
using Spyder IDE on windows 7
1.
queue is not being populated -> everytime I try to read it, its empty. Somewhere I read, that I have to get() it before process join() but it did not solve it.
from multiprocessing import Process,Queue
# define a example function
def fnc(i, output):
output.put(i)
if __name__ == '__main__':
# Define an output queue
output = Queue()
# Setup a list of processes that we want to run
processes = [Process(target=fnc, args=(i, output)) for i in range(4)]
print('created')
# Run processes
for p in processes:
p.start()
print('started')
# Exit the completed processes
for p in processes:
p.join()
print(output.empty())
print('finished')
>>>created
>>>started
>>>True
>>>finished
I would expect output to not be empty.
if I change it from .join() to
for p in processes:
print(output.get())
#p.join()
it freezes
2.
Next problem I have is with pool.map() - it freezes and has no chance to exceed memory limit. I dont even know how to debug such simple pieace of code.
from multiprocessing import Pool
def f(x):
return x*x
if __name__ == '__main__':
pool = Pool(processes=4)
print('Pool created')
# print "[0, 1, 4,..., 81]"
print(pool.map(f, range(10))) # it freezes here
Hope its not a big deal to have two questions in one topic
Apperently the problem is Spyder's IPython console. When I run both from cmd, its executed properly.
Solution
for debugging in Spyder add .dummy to multiprocessing import
from multiprocessing.dummy import Process,Queue
It will not be executed by more processors, but you will get results and can actualy see the output. When debugging is done simply delete .dummy, place it in another file, import it and call it for example as function
multiprocessing_my.py
from multiprocessing import Process,Queue
# define a example function
def fnc(i, output):
output.put(i)
print(i)
def test():
# Define an output queue
output = Queue()
# Setup a list of processes that we want to run
processes = [Process(target=fnc, args=(i, output)) for i in range(4)]
print('created')
# Run processes
for p in processes:
p.start()
print('started')
# Exit the completed processes
for p in processes:
p.join()
print(output.empty())
print('finished')
# Get process results from the output queue
results = [output.get() for p in processes]
print('get results')
print(results)
test_mp.py
executed by selecting code and pressing ctrl+Enter
import multiprocessing_my
multiprocessing_my.test2()
...
In[9]: test()
created
0
1
2
3
started
False
finished
get results
[0, 1, 2, 3]

multiprocess messaging queue between functions or process python

Im trying to understand how processes are messaging the other one, below example;
i use second function to do my main job, and queue feeds first function sometimes to do it own job and no matter when its finished, i look many example and try different ways, but no success, is any one can explain how can i do it over my example.
from multiprocessing import Process, Queue, Manager
import time
def first(a,b):
q.get()
print a+b
time.sleep(3)
def second():
for i in xrange(10):
print "seconf func"
k+=1
q.put=(i,k)
if __name__ == "__main__":
processes = []
q = Queue()
manager = Manager()
p = Process(target=first, args=(a,b))
p.start()
processes.append(p)
p2 = Process(target=second)
p2.start()
processes.append(p2)
try:
for process in processes:
process.join()
except KeyboardInterrupt:
print "Interupt"

Python - How to pass global variable to multiprocessing.Process?

I need to terminate some processes after a while, so I've used sleeping another process for the waiting. But the new process doesn't have access to global variables from the main process I guess. How could I solve it please?
Code:
import os
from subprocess import Popen, PIPE
import time
import multiprocessing
log_file = open('stdout.log', 'a')
log_file.flush()
err_file = open('stderr.log', 'a')
err_file.flush()
processes = []
def processing():
print "processing"
global processes
global log_file
global err_file
for i in range(0, 5):
p = Popen(['java', '-jar', 'C:\\Users\\two\\Documents\\test.jar'], stdout=log_file, stderr=err_file) # something long running
processes.append(p)
print len(processes) # returns 5
def waiting_service():
name = multiprocessing.current_process().name
print name, 'Starting'
global processes
print len(processes) # returns 0
time.sleep(2)
for i in range(0, 5):
processes[i].terminate()
print name, 'Exiting'
if __name__ == '__main__':
processing()
service = multiprocessing.Process(name='waiting_service', target=waiting_service)
service.start()
You should be using synchronization primitives.
Possibly you want to set an Event that's triggered after a while by the main (parent) process.
You may also want to wait for the processes to actually complete and join them (like you would a thread).
If you have many similar tasks, you can use a processing pool like multiprocessing.Pool.
Here is a small example of how it's done:
import multiprocessing
import time
kill_event = multiprocessing.Event()
def work(_id):
while not kill_event.is_set():
print "%d is doing stuff" % _id
time.sleep(1)
print "%d quit" % _id
def spawn_processes():
processes = []
# spawn 10 processes
for i in xrange(10):
# spawn process
process = multiprocessing.Process(target=work, args=(i,))
processes.append(process)
process.start()
time.sleep(1)
# kill all processes by setting the kill event
kill_event.set()
# wait for all processes to complete
for process in processes:
process.join()
print "done!"
spawn_processes()
The whole problem was in Windows' Python. Python for Windows is blocking global variables to be seen in functions. I've switched to linux and my script works OK.
Special thanks to #rchang for his comment:
When I tested it, in both cases the print statement came up with 5. Perhaps we have a version mismatch in some way? I tested it with Python 2.7.6 on Linux kernel 3.13.0 (Mint distribution).

Categories

Resources