I cannot use multiprocessing in IPython - python

Last year I wrote a simple code to mosaic hundreds of raster data. It went well.
Last week I picked up this code to do the same work, but it didn't work.
I copy the official demo:
from multiprocessing import Process
def f(name):
print 'hello', name
if __name__ == '__main__':
p = Process(target=f, args=('bob',))
p.start()
p.join()
But there is no feedback, no report error. If I switch multiprocessing module to Thread module, it goes well.
Then I find out that the the demo can be ran in Python console but not in IPython console.
PS, I use WinPython-64bit, and have tried 2.7 and 3.5 version, both have the same problem. And I cannot use multiprocessing module in the ArcGIS python console.
Thanks

I have not understand your question that much.
but, if you want to use Thread, you can try this.
from multiprocessing.dummy import Pool as ThreadPool
def f(name):
print 'hello', name
if __name__ == '__main__':
pool = ThreadPool(4) # 4 is number of your CPU core, 4 times faster
names = ['bob','jack','newt'] #prepare your name list
results = pool.map(f, names)
pool.close()
pool.join()

Related

Python multiprocessing with macOs

I have a mac (MacOs 10.15.4, Python ver 3.82) and need to work in multiprocessing, but on my pc the procedures doesn’t work.
For example, I have copied a simple parallel python program
import multiprocessing as mp
import time
def test_function(i):
print("function starts" + str(i))
time.sleep(1)
print("function ends" + str(i))
if __name__ == '__main__':
pool = mp.Pool(mp.cpu_count())
pool.map(test_function, [i for i in range(4)])
pool.close()
pool.join()
What I expect to see in the output:
function starts0
function ends0
function starts1
function ends1
function starts2
function ends2
function starts3
function ends3
Or similar...
What I actually see:
= RESTART: /Users/Simulazioni/prova.py
>>>
Just nothing, no errors and no informations, just nothing. I have already try mamy procedure without results. The main problem, I could see, is the call of the function, in fact the instruction:
if __name__ == '__main__':
doesn’t call the function,
def test_function(i):
I tried many example of that kind without results.
Is ti possible and/or what's is the easiest way to parallelize in macOs?
I know this is a bit of an old question, but I just faced the same issue and I solved it by using a different multithreading package's version, which is:
import multiprocess
instead of:
import multiprocessing
Reference:
https://pypi.org/project/multiprocess/

Use python's multiprocessing library in Rust

I use Rust to speed up a data processing pipeline, but I have to run some existing Python code as-is, which I want to parallelize. Following discussion in another question, creating multiple Python processes is a possible approach given my project's specific constraints. However, running the code below gives an infinite loop. I can't quite understand why.
use cpython::Python;
fn main() {
let gil = Python::acquire_gil();
let py = gil.python();
py.run(r#"
import sys
from multiprocessing import Process
def f(name):
print('hello', name)
if __name__ == '__main__':
print('start')
sys.argv=['']
p = Process(target=f, args=('bob',))
p.start()
p.join()
"#, None,None).unwrap();
}
Output (continues until Ctrl-C):
start
start
start
start
start
start
start
start
EDIT
As mentioned in the comments below, I gave up on trying to create processes from the Python code. The interference between Windows, the Python multiprocessing module, and how processes are created with Rust are too obscure to manage properly.
So instead I will create and manage them from Rust. The code is therefore more textbook:
use std::process::Command;
fn main() {
let mut cmd = Command::new("python");
cmd.args(&["-c", "print('test')"]);
let process = cmd.spawn().expect("Couldn't spawn process.");
println!("{:?}", process.wait_with_output().unwrap());
}
I can't reproduce this; for me it just prints start and then hello bob as expected. For whatever reason, it seems that in your case, __name__ is always equal to "__main__" and you get this infinite recursion. I'm using the cpython crate version v0.4.1 and Python 3.8.1 on Arch Linux.
A workaround is to not depend on __name__ at all, but to instead define your Python code as a module with a main() function and then call that function:
use cpython::{Python, PyModule};
fn main() {
let gil = Python::acquire_gil();
let py = gil.python();
let module = PyModule::new(py, "bob").unwrap();
py.run(r#"
import sys
from multiprocessing import Process
def f(name):
print('hello', name)
def main():
print('start')
sys.argv=['']
p = Process(target=f, args=('bob',))
p.start()
p.join()
"#, Some(&module.dict(py)), None).unwrap();
module.call(py, "main", cpython::NoArgs, None).unwrap();
}

Issue with Python's Multiprocessing in Google Colab

I need to do some multiprocessing with my Python scripts and I decided to give it a try with Google's collaboratory.
I've connected to local runtime and tried to run the following script:
import multiprocessing
def spawn(num):
print('Spawned! {}'.format(num))
if __name__ == '__main__':
for i in range(5):
p = multiprocessing.Process(target=spawn, args=(i,))
p.start()
However, when I run this, nothing happens. Absolutely nothing, no errors, no prints, it just executes instantly and that's it.
Am I missing something? Does multiprocessing work with Google Colab local runtime?
Thanks in advance.
Run this instead
import multiprocessing
def spawn(num):
print('Spawned! {}'.format(num))
for i in range(5):
p = multiprocessing.Process(target=spawn, args=(i,))
p.start()

Python multiprocessing.Pool does not start right away

I want to input text to python and process it in parallel. For that purpose I use multiprocessing.Pool. The problem is that sometime, not always, I have to input text multiple times before anything is processed.
This is a minimal version of my code to reproduce the problem:
import multiprocessing as mp
import time
def do_something(text):
print('Out: ' + text, flush=True)
# do some awesome stuff here
if __name__ == '__main__':
p = None
while True:
message = input('In: ')
if not p:
p = mp.Pool()
p.apply_async(do_something, (message,))
What happens is that I have to input text multiple times before I get a result, no matter how long I wait after I have inputted something the first time. (As stated above, that does not happen every time.)
python3 test.py
In: a
In: a
In: a
In: Out: a
Out: a
Out: a
If I create the pool before the while loop or if I add time.sleep(1) after creating the pool, it seems to work every time. Note: I do not want to create the pool before I get an input.
Has someone an explanation for this behavior?
I'm running Windows 10 with Python 3.4.2
EDIT: Same behavior with Python 3.5.1
EDIT:
An even simpler example with Pool and also ProcessPoolExecutor. I think the problem is the call to input() right after appyling/submitting, which only seems to be a problem the first time appyling/submitting something.
import concurrent.futures
import multiprocessing as mp
import time
def do_something(text):
print('Out: ' + text, flush=True)
# do some awesome stuff here
# ProcessPoolExecutor
# if __name__ == '__main__':
# with concurrent.futures.ProcessPoolExecutor() as executor:
# executor.submit(do_something, 'a')
# input('In:')
# print('done')
# Pool
if __name__ == '__main__':
p = mp.Pool()
p.apply_async(do_something, ('a',))
input('In:')
p.close()
p.join()
print('done')
Your code works when I tried it on my Mac.
In Python 3, it might help to explicitly declare how many processors will be in your pool (ie the number of simultaneous processes).
try using p = mp.Pool(1)
import multiprocessing as mp
import time
def do_something(text):
print('Out: ' + text, flush=True)
# do some awesome stuff here
if __name__ == '__main__':
p = None
while True:
message = input('In: ')
if not p:
p = mp.Pool(1)
p.apply_async(do_something, (message,))
I could not reproduce it on Windows 7 but there are few long shots worth to mention for your issue.
your AV might be interfering with the newly spawned processes, try temporarily disabling it and see if the issue is still present.
Win 10 might have different IO caching algorithm, try inputting larger strings. If it works, it means that the OS tries to be smart and sends data when a certain amount has piled up.
As Windows has no fork() primitive, you might see the delay caused by the spawn starting method.
Python 3 added a new pool of workers called ProcessPoolExecutor, I'd recommend you to use this no matter the issue you suffer from.

Collecting result from different process in python

I am doing several process togethere. Each of the proccess returns some results. How would I collect those results from those process.
task_1 = Process(target=do_this_task,args=(para_1,para_2))
task_2 = Process(target=do_this_task,args=(para_1,para_2))
do_this_task returns some results. I would like to collect those results and save them in some variable.
So right now I would suggest you should use the python multiprocessing module's Pool as it handles quite a bit for you. Could you elaborate what you're doing and why you want to use what I assume to be multiprocessing.Process directly?
If you still want to use multiprocessing.Process directly you should use a Queue to get the return values.
example given in the docs:
"
from multiprocessing import Process, Queue
def f(q):
q.put([42, None, 'hello'])
if __name__ == '__main__':
q = Queue()
p = Process(target=f, args=(q,))
p.start()
print q.get() # prints "[42, None, 'hello']"
p.join()
"-Multiprocessing Docs
So processes are things that usually run in the background to do something in general, if you do multiprocessing with them you need to 'throw around' the data since processes don't have shared memory like threads - so that's why you use the Queue - it does it for you. Another thing you can do is pipes, and conveniently they give an example for that as well :).
"
from multiprocessing import Process, Pipe
def f(conn):
conn.send([42, None, 'hello'])
conn.close()
if __name__ == '__main__':
parent_conn, child_conn = Pipe()
p = Process(target=f, args=(child_conn,))
p.start()
print parent_conn.recv() # prints "[42, None, 'hello']"
p.join()
"
-Multiprocessing Docs
what this does is manually use pipes to throw around the finished results to the 'parent process' in this case.
Also sometimes I find cases which multiprocessing cannot pickle well so I use this great answer (or my modified specialized variants of) by mrule that he posts here:
"
from multiprocessing import Process, Pipe
from itertools import izip
def spawn(f):
def fun(pipe,x):
pipe.send(f(x))
pipe.close()
return fun
def parmap(f,X):
pipe=[Pipe() for x in X]
proc=[Process(target=spawn(f),args=(c,x)) for x,(p,c) in izip(X,pipe)]
[p.start() for p in proc]
[p.join() for p in proc]
return [p.recv() for (p,c) in pipe]
if __name__ == '__main__':
print parmap(lambda x:x**x,range(1,5))
"
you should be warned however that this takes over control manually of the processes so certain things can leave 'dead' processes lying around - which is not a good thing, an example being unexpected signals - this is an example of using pipes for multi-processing though :).
If those commands are not in python, e.g. you want to run ls then you might be better served by using subprocess, as os.system isn't a good thing to use anymore necessarily as it is now considered that subprocess is an easier-to-use and more flexible tool, a small discussion is presented here.
You can do something like this with multiprocessing
from multiprocessing import Pool
mydict = {}
with Pool(processes=5) as pool:
task_1 = pool.apply_async(do_this_task,args=(para_1,para_2))
task_2 = pool.apply_async(do_this_task,args=(para_1,para_2))
mydict.update({"task_1": task_1.get(), "task_2":task_2.get()})
print(mydict)
or if you would like to try multithreading with concurrent.futures then take a look at this answer.
If the processes are external scripts then try using the subprocess module. However, your code suggests you want to run functions in parallel. For this, try the multiprocessing module. Some code from this answer for specific details of using multiprocessing:
def foo(bar, baz):
print 'hello {0}'.format(bar)
return 'foo' + baz
from multiprocessing.pool import ThreadPool
pool = ThreadPool(processes=1)
async_result = pool.apply_async(foo, ('world', 'foo')) # tuple of args for foo
# do some other stuff in the other processes
return_val = async_result.get() # get the return value from your function.

Categories

Resources