Python multiprocessing with M1 Mac - python

I have a mac (Mac Os 11.1, Python Ver 3.8.2) and need to work in multiprocessing, but the procedures doesn’t work.
import multiprocessing
def func(index: int):
print(index)
manager = multiprocessing.Manager()
processes = []
for i in range(-1, 10):
p = multiprocessing.Process(target=func,
args=(i,))
processes.append(p)
p.start()
for process in processes:
process.join()
However, on my Intel-based Mac, it works fine.
What I expect is
-1
0
1
2
3
4
5
6
7
8
9
But instead, I got an error:
RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.
This probably means that you are not using fork to start your
child processes and you have forgotten to use the proper idiom
in the main module:
if __name__ == '__main__':
freeze_support()
...
The "freeze_support()" line can be omitted if the program
is not going to be frozen to produce an executable.
Traceback (most recent call last):
File "/Users/lance/Documents/Carleton Master Projects/Carleton-Master-Thesis/experiment.py", line 7, in <module>
manager = multiprocessing.Manager()
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/multiprocessing/context.py", line 57, in Manager
m.start()
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/multiprocessing/managers.py", line 583, in start
self._address = reader.recv()
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/multiprocessing/connection.py", line 250, in recv
buf = self._recv_bytes()
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/multiprocessing/connection.py", line 414, in _recv_bytes
buf = self._recv(4)
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/multiprocessing/connection.py", line 383, in _recv
raise EOFError
EOFError
Is there any similar way (also keep it easy) to parallelize in M1-based Mac?

I'm not sure why this works on an Intel machine, but the problem is all the MP-related code should be inside if __name__ == '__main__':
import multiprocessing
def func(index: int):
print(index)
if __name__ == '__main__':
manager = multiprocessing.Manager()
processes = []
for i in range(-1, 10):
p = multiprocessing.Process(target=func, args=(i,))
processes.append(p)
p.start()
for process in processes:
process.join()
Since MP actually starts a new process, the new process needs to import the func function from this module to execute its task. If you don't protect the code that starts new processes, it will be executed during this import.

Related

Python3 multiprocessing Pool and Lock error

i have have code similaire to what shown below, So please i want the proper solution to run this without errors, i got shared memory error and also open gui many times,
mainapp = ProcessFiles()
p = multiprocessing.Pool()
p.map(mainapp.getPdfInfo, Files_list)
p.close()
class ProcessFiles:
def __init__():
self.lock = multiprocessing.Lock()
def getPdfInfo(file):
#READ FILES DATA AND DO SOME STUFFS
self.lock.aquere()
#INSERT DATA TO DATABASE
self.lock.release()
and this the error msg
TypeError: can't pickle sqlite3.Connection objects
Alos tried with multiprocessing.Manager() and also got errors, Code shown below
mainapp = ProcessFiles()
p = multiprocessing.Pool()
p.map(mainapp.getPdfInfo, Files_list)
p.close()
class ProcessFiles:
def __init__():
m = multiprocessing.Manager()
self.lock = m.Lock()
def getPdfInfo(file):
#READ FILES DATA AND DO SOME STUFFS
self.lock.acquire()
#INSERT DATA TO DATABASE
self.lock.release()
and thats the error msg
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "C:\Python37\lib\multiprocessing\spawn.py", line 105, in spawn_main
exitcode = _main(fd)
++++++++++++++++++++++++++++++
_check_not_importing_main()
File "C:\Python37\lib\multiprocessing\spawn.py", line 136, in
_check_not_importing_main
is not going to be frozen to produce an executable.''')
RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.
This probably means that you are not using fork to start your
child processes and you have forgotten to use the proper idiom
in the main module:
if __name__ == '__main__':
freeze_support()
...
The "freeze_support()" line can be omitted if the program
is not going to be frozen to produce an executable.

OSError (Errno 9) when using multiprocessing.Array in Python

I'm trying to use a multiprocessing.Array in two separate processes in Python 3.7.4 (macOS 10.14.6). I start off by creating a new process using the spawn context, passing as an argument to it an Array object:
import multiprocessing, time, ctypes
def fn(c):
time.sleep(1)
print("value:", c.value)
def main():
ctx = multiprocessing.get_context("spawn")
arr = multiprocessing.Array(ctypes.c_char, 32)
p = ctx.Process(target=fn, args=(arr,))
p.start()
arr.value = b"hello"
p.join()
if __name__ == "__main__":
main()
However, when I try to read it, I get the following error:
Process SpawnProcess-1:
Traceback (most recent call last):
File "/usr/local/Cellar/python/3.7.4/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
self.run()
File "/usr/local/Cellar/python/3.7.4/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/process.py", line 99, in run
self._target(*self._args, **self._kwargs)
File "/Users/federico/Workspace/test/test.py", line 6, in fn
print("value:", c.value)
File "<string>", line 3, in getvalue
OSError: [Errno 9] Bad file descriptor
The expected output, however, is value: hello. Anyone know what could be going wrong here? Thanks.
The array should also be defined in the context that you define for the multiprocessing like so:
import multiprocessing, time
import ctypes
from multiprocessing import Process
def fn(arr):
time.sleep(1)
print("value:", arr.value)
def main():
ctx = multiprocessing.get_context("spawn")
arr = ctx.Array(ctypes.c_char, 32)
p = ctx.Process(target=fn, args=(arr,))
p.start()
arr.value = b'hello'
p.join()
if __name__ == "__main__":
main()

Multiprocessing simple function doesn't work but why

I am trying to multiprocess system commands, but can't get it to work with a simple program. The function runit(cmd) works fine though...
#!/usr/bin/python3
from subprocess import call, run, PIPE,Popen
from multiprocessing import Pool
import os
pool = Pool()
def runit(cmd):
proc = Popen(cmd, shell=True,stdout=PIPE, stderr=PIPE, universal_newlines=True)
return proc.stdout.read()
#print(runit('ls -l'))
it = []
for i in range(1,3):
it.append('ls -l')
results = pool.map(runit, it)
It outputs:
Process ForkPoolWorker-1:
Process ForkPoolWorker-2:
Traceback (most recent call last):
Traceback (most recent call last):
File "/usr/lib/python3.5/multiprocessing/process.py", line 249, in _bootstrap
self.run()
File "/usr/lib/python3.5/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "/usr/lib/python3.5/multiprocessing/pool.py", line 108, in worker
task = get()
File "/usr/lib/python3.5/multiprocessing/queues.py", line 345, in get
return ForkingPickler.loads(res)
AttributeError: Can't get attribute 'runit' on <module '__main__' from './syscall.py'>
File "/usr/lib/python3.5/multiprocessing/process.py", line 249, in _bootstrap
self.run()
File "/usr/lib/python3.5/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "/usr/lib/python3.5/multiprocessing/pool.py", line 108, in worker
task = get()
File "/usr/lib/python3.5/multiprocessing/queues.py", line 345, in get
return ForkingPickler.loads(res)
AttributeError: Can't get attribute 'runit' on <module '__main__' from './syscall.py'>
Then it somehow waits and does nothing, and when I press Ctrl+C a few times it spits out:
^CProcess ForkPoolWorker-4:
Process ForkPoolWorker-6:
Traceback (most recent call last):
File "./syscall.py", line 17, in <module>
Process ForkPoolWorker-5:
results = pool.map(runit, it)
File "/usr/lib/python3.5/multiprocessing/pool.py", line 260, in map
...
buf = self._recv(4)
File "/usr/lib/python3.5/multiprocessing/connection.py", line 379, in _recv
chunk = read(handle, remaining)
KeyboardInterrupt
I'm not sure, since the issue I know is windows-related (and I don't have access to Linux box to reprocude), but in order to be portable you have to wrap your multiprocessing-dependent commands in if __name__=="__main__" or it conflicts with the way python spawns the processes: that fixed example runs fine on windows (and should work OK on other platforms as well):
from multiprocessing import Pool
import os
def runit(cmd):
proc = Popen(cmd, shell=True,stdout=PIPE, stderr=PIPE, universal_newlines=True)
return proc.stdout.read()
#print(runit('ls -l'))
it = []
for i in range(1,3):
it.append('ls -l')
if __name__=="__main__":
# all calls to multiprocessing module are "protected" by this directive
pool = Pool()
(Studying the error messages more closely, now I'm pretty sure that just moving pool = Pool() after the declaration of runit would fix it as well on Linux, but wrapping in __main__ fixes+makes it portable)
That said, note that your multiprocessing just creates a new process, so you'd be better off with thread pools (Threading pool similar to the multiprocessing Pool?): threads which creates processes, like this:
from multiprocessing.pool import ThreadPool # uses threads, not processes
import os
def runit(cmd):
proc = Popen(cmd, shell=True,stdout=PIPE, stderr=PIPE, universal_newlines=True)
return proc.stdout.read()
it = []
for i in range(1,3):
it.append('ls -l')
if __name__=="__main__":
pool = ThreadPool() # ThreadPool instead of Pool
results = pool.map(runit, it)
print(results)
results = pool.map(runit, it)
print(results)
the latter solution is more lightweight and is less issue-prone (multiprocessing is a delicate module to handle). You'll be able to work with objects, shared data, etc... without the need for a Manager object, among other advantages

python tempfile and multiprocessing pool error

I'm experimenting with python's multiprocessing. I struggled with a bug in my code and managed to narrow it down. However, I still don't know why this happens. What I'm posting is just sample code. If I import tempfile module and change tempdir, the code crashes at pool creation. I'm using python 2.7.5
Here's the code
from multiprocessing import Pool
import tempfile
tempfile.tempdir = "R:/" #REMOVING THIS LINE FIXES THE ERROR
def f(x):
return x*x
if __name__ == '__main__':
pool = Pool(processes=4) # start 4 worker processes
result = pool.apply_async(f, [10]) # evaluate "f(10)" asynchronously
print result.get(timeout=1) # prints "100" unless your computer is *very* slow
print pool.map(f, range(10)) # prints "[0, 1, 4,..., 81]"
Here's error
R:\>mp_pool_test.py
Traceback (most recent call last):
File "R:\mp_pool_test.py", line 11, in <module>
pool = Pool(processes=4) # start 4 worker processes
File "C:\Python27\lib\multiprocessing\__init__.py", line 232, in Pool
return Pool(processes, initializer, initargs, maxtasksperchild)
File "C:\Python27\lib\multiprocessing\pool.py", line 138, in __init__
self._setup_queues()
File "C:\Python27\lib\multiprocessing\pool.py", line 233, in _setup_queues
self._inqueue = SimpleQueue()
File "C:\Python27\lib\multiprocessing\queues.py", line 351, in __init__
self._reader, self._writer = Pipe(duplex=False)
File "C:\Python27\lib\multiprocessing\__init__.py", line 107, in Pipe
return Pipe(duplex)
File "C:\Python27\lib\multiprocessing\connection.py", line 223, in Pipe
1, obsize, ibsize, win32.NMPWAIT_WAIT_FOREVER, win32.NULL
WindowsError: [Error 123] The filename, directory name, or volume label syntax is incorrect
This code works fine.
from multiprocessing import Pool
import tempfile as TF
TF.tempdir = "R:/"
def f(x):
return x*x
if __name__ == '__main__':
print("test")
The bizarre thing is that, both times I don't do anything with TF.tempdir, but the one with the Pool doesn't work for some reason.
It is cool it looks like you have a name collision from what I can see in
"C:\Program Files\PYTHON\Lib\multiprocessing\connection.py"
It seems that multipprocessing is using tempfile as well
That behavior should not happen but it looks to me like the problem is in line 66 of connection.py
elif family == 'AF_PIPE':
return tempfile.mktemp(prefix=r'\\.\pipe\pyc-%d-%d-' %
(os.getpid(), _mmap_counter.next()))
I am still poking at this, I looked at globals after importing tempfile and then tempfile as TF, different names exist but now I am wondering about references, and so am trying to figure out if they point to the same thing.

Race condition using multiprocessing and threading together

I wrote the sample program.
It creates 8 threads and spawns process in each one
import threading
from multiprocessing import Process
def fast_function():
pass
def thread_function():
process_number = 1
print 'start %s processes' % process_number
for i in range(process_number):
p = Process(target=fast_function, args=())
p.start()
p.join()
def main():
threads_number = 8
print 'start %s threads' % threads_number
threads = [threading.Thread(target=thread_function, args=())
for i in range(threads_number)]
for thread in threads:
thread.start()
for thread in threads:
thread.join()
It crashes with several exceptions like this
Exception in thread Thread-3:
Traceback (most recent call last):
File "/usr/lib/python2.6/threading.py", line 532, in __bootstrap_inner
self.run()
File "/usr/lib/python2.6/threading.py", line 484, in run
self.__target(*self.__args, **self.__kwargs)
File "./repeat_multiprocessing_bug.py", line 15, in thread_function
p.start()
File "/usr/lib/python2.6/multiprocessing/process.py", line 99, in start
_cleanup()
File "/usr/lib/python2.6/multiprocessing/process.py", line 53, in _cleanup
if p._popen.poll() is not None:
File "/usr/lib/python2.6/multiprocessing/forking.py", line 106, in poll
pid, sts = os.waitpid(self.pid, flag)
OSError: [Errno 10] No child processes
Python version 2.6.5. Can somebody explain what I do wrong?
You're probably trying to run it from the interactive interpreter. Try writing your code to a file and run it as a python script, it works on my machine...
See the explanation and examples at the Python multiprocessing docs.
The multiprocessing module has a thread-safety issue in 2.6.5. Your best bet is updating to a newer Python, or add this patch to 2.6.5: http://hg.python.org/cpython/rev/41aef062d529/
The bug is described in more detail in the following links:
http://bugs.python.org/issue11891
http://bugs.python.org/issue1731717

Categories

Resources