Profiling Python code that uses multiprocessing?

Profiling Python code that uses multiprocessing? - python

I have a simple producer consumer pattern set up in part of my gui code. I'm attempting to profile just the specific consumer section to see if there's any chance for optimization. However, when attempting to run the code with python -m cProfile -o out.txt myscript.py I'm getting an error thrown from Python's pickle module.
File "<string>", line 1, in <module>
File "c:\python27\lib\multiprocessing\forking.py", line 374, in main
self = load(from_parent)
File "c:\python27\lib\pickle.py", line 1378, in load
return Unpickler(file).load()
File "c:\python27\lib\pickle.py", line 858, in load
dispatch[key](self)
File "c:\python27\lib\pickle.py", line 880, in load_eof
raise EOFError
EOFError
The basic pattern in the code is
class MyProcess(multiprocessing.Process):
def __init__(self, in_queue, msg_queue):
multiprocessing.Process.__init__(self)
self.in_queue = in_queue
self.ext_msg_queue = msg_queue
self.name == multiprocessing.current_process().name
def run(self):
## Do Stuff with the queued items
This is usually fed tasks from the GUI side of things, but for testing purposes, I set it up as follows.
if __name__ == '__main__':
queue = multiprocessing.Queue()
meg_queue = multiprocessing.Queue()
p = Grabber(queue)
p.daemon = True
p.start()
time.sleep(20)
p.join()
But upon trying start the script, I get the above error message.
Is there a way around the error?

In my understanding the cProfile module doesn't play well with multiprocessing on the command line. (See Python multiprocess profiling.)
One way to work around this is to run the profiler from within your code -- in particular, you'll need to set up a different profiler output file for each process in your pool. You can do this pretty easily by making a call to cProfile.runctx('a+b', globals(), locals(), 'profile-%s.out' % process_name).
http://docs.python.org/2/library/profile.html#module-cProfile

Related

Python3 multiprocessing Pool and Lock error

i have have code similaire to what shown below, So please i want the proper solution to run this without errors, i got shared memory error and also open gui many times,
mainapp = ProcessFiles()
p = multiprocessing.Pool()
p.map(mainapp.getPdfInfo, Files_list)
p.close()
class ProcessFiles:
def __init__():
self.lock = multiprocessing.Lock()
def getPdfInfo(file):
#READ FILES DATA AND DO SOME STUFFS
self.lock.aquere()
#INSERT DATA TO DATABASE
self.lock.release()
and this the error msg
TypeError: can't pickle sqlite3.Connection objects
Alos tried with multiprocessing.Manager() and also got errors, Code shown below
mainapp = ProcessFiles()
p = multiprocessing.Pool()
p.map(mainapp.getPdfInfo, Files_list)
p.close()
class ProcessFiles:
def __init__():
m = multiprocessing.Manager()
self.lock = m.Lock()
def getPdfInfo(file):
#READ FILES DATA AND DO SOME STUFFS
self.lock.acquire()
#INSERT DATA TO DATABASE
self.lock.release()
and thats the error msg
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "C:\Python37\lib\multiprocessing\spawn.py", line 105, in spawn_main
exitcode = _main(fd)
++++++++++++++++++++++++++++++
_check_not_importing_main()
File "C:\Python37\lib\multiprocessing\spawn.py", line 136, in
_check_not_importing_main
is not going to be frozen to produce an executable.''')
RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.
This probably means that you are not using fork to start your
child processes and you have forgotten to use the proper idiom
in the main module:
if __name__ == '__main__':
freeze_support()
...
The "freeze_support()" line can be omitted if the program
is not going to be frozen to produce an executable.

pyserial with multiprocessing gives me a ctype error

Hi I'm trying to write a module that lets me read and send data via pyserial. I have to be able to read the data in parallel to my main script. With the help of a stackoverflow user, I have a basic and working skeleton of the program, but when I tried adding a class I created that uses pyserial (handles finding port, speed, etc) found here I get the following error:
File "<ipython-input-1-830fa23bc600>", line 1, in <module>
runfile('C:.../pythonInterface1/Main.py', wdir='C:/Users/Daniel.000/Desktop/Daniel/Python/pythonInterface1')
File "C:...\Anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 827, in runfile
execfile(filename, namespace)
File "C:...\Anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 110, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)
File "C:/Users/Daniel.000/Desktop/Daniel/Python/pythonInterface1/Main.py", line 39, in <module>
p.start()
File "C:...\Anaconda3\lib\multiprocessing\process.py", line 112, in start
self._popen = self._Popen(self)
File "C:...\Anaconda3\lib\multiprocessing\context.py", line 223, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "C:...\Anaconda3\lib\multiprocessing\context.py", line 322, in _Popen
return Popen(process_obj)
File "C:...\Anaconda3\lib\multiprocessing\popen_spawn_win32.py", line 89, in __init__
reduction.dump(process_obj, to_child)
File "C:...\Anaconda3\lib\multiprocessing\reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
ValueError: ctypes objects containing pointers cannot be pickled
This is the code I am using to call the class in SerialConnection.py
import multiprocessing
from time import sleep
from operator import methodcaller
from SerialConnection import SerialConnection as SC
class Spawn:
def __init__(self, _number, _max):
self._number = _number
self._max = _max
# Don't call update here
def request(self, x):
print("{} was requested.".format(x))
def update(self):
while True:
print("Spawned {} of {}".format(self._number, self._max))
sleep(2)
if __name__ == '__main__':
'''
spawn = Spawn(1, 1) # Create the object as normal
p = multiprocessing.Process(target=methodcaller("update"), args=(spawn,)) # Run the loop in the process
p.start()
while True:
sleep(1.5)
spawn.request(2) # Now you can reference the "spawn"
'''
device = SC()
print(device.Port)
print(device.Baud)
print(device.ID)
print(device.Error)
print(device.EMsg)
p = multiprocessing.Process(target=methodcaller("ReadData"), args=(device,)) # Run the loop in the process
p.start()
while True:
sleep(1.5)
device.SendData('0003')
What am I doing wrong for this class to be giving me problems? Is there some form of restriction to use pyserial and multiprocessing together? I know it can be done but I don't understand how...
here is the traceback i get from python
Traceback (most recent call last): File "C:...\Python\pythonInterface1\Main.py", line 45, in <module>
p.start()
File "C:...\AppData\Local\Programs\Python\Python36-32\lib\multiprocessing\process.py", line 105, in start
self._popen = self._Popen(self)
File "C:...\AppData\Local\Programs\Python\Python36-32\lib\multiprocessing\context.py", line 223, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "C:...\AppData\Local\Programs\Python\Python36-32\lib\multiprocessing\context.py", line 322, in _Popen
return Popen(process_obj)
File "C:...\AppData\Local\Programs\Python\Python36-32\lib\multiprocessing\popen_spawn_win32.py", line 65, in __init__
reduction.dump(process_obj, to_child)
File "C:...\AppData\Local\Programs\Python\Python36-32\lib\multiprocessing\reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj) ValueError: ctypes objects containing pointers cannot be pickled

You are trying to pass a SerialConnection instance to another process as an argument. For that python has first to serialize (pickle) the object, and it is not possible for SerialConnection objects.
As said in Rob Streeting's answer, a possible solution would be to allow the SerialConnection object to be copied to the other process' memory using the fork that occurs when multiprocessing.Process.start is invoked, but this will not work on Windows as it does not use fork.
A simpler, cross-platform and more efficient way to achieve parallelism in your code would be to use a thread instead of a process. The changes to your code are minimal:
import threading
p = threading.Thread(target=methodcaller("ReadData"), args=(device,))

I think the problem is due to something inside device being unpicklable (i.e., not serializable by python). Take a look at this page to see if you can see any rules that may be broken by something in your device object.
So why does device need to be picklable at all?
When a multiprocessing.Process is started, it uses fork() at the operating system level (unless otherwise specified) to create the new process. What this means is that the whole context of the parent process is "copied" over to the child. This does not require pickling, as it's done at the operating system level.
(Note: On unix at least, this "copy" is actually a pretty cheap operation because it used a feature called "copy-on-write". This means that both parent and child processes actually read from the same memory until one or the other modifies it, at which point the original state is copied over to the child process.)
However, the arguments of the function that you want the process to take care of do have to be pickled, because they are not part of the main process's context. So, that includes your device variable.
I think you might be able to resolve your issue by allowing device to be copied as part of the fork operation rather than passing it in as a variable. To do this though, you'll need a wrapper function around the operation you want your process to do, in this case methodcaller("ReadData"). Something like this:
if __name__ == "__main__":
device = SC()
def call_read_data():
device.ReadData()
...
p = multiprocessing.Process(target=call_read_data) # Run the loop in the process
p.start()

python tempfile and multiprocessing pool error

I'm experimenting with python's multiprocessing. I struggled with a bug in my code and managed to narrow it down. However, I still don't know why this happens. What I'm posting is just sample code. If I import tempfile module and change tempdir, the code crashes at pool creation. I'm using python 2.7.5
Here's the code
from multiprocessing import Pool
import tempfile
tempfile.tempdir = "R:/" #REMOVING THIS LINE FIXES THE ERROR
def f(x):
return x*x
if __name__ == '__main__':
pool = Pool(processes=4) # start 4 worker processes
result = pool.apply_async(f, [10]) # evaluate "f(10)" asynchronously
print result.get(timeout=1) # prints "100" unless your computer is *very* slow
print pool.map(f, range(10)) # prints "[0, 1, 4,..., 81]"
Here's error
R:\>mp_pool_test.py
Traceback (most recent call last):
File "R:\mp_pool_test.py", line 11, in <module>
pool = Pool(processes=4) # start 4 worker processes
File "C:\Python27\lib\multiprocessing\__init__.py", line 232, in Pool
return Pool(processes, initializer, initargs, maxtasksperchild)
File "C:\Python27\lib\multiprocessing\pool.py", line 138, in __init__
self._setup_queues()
File "C:\Python27\lib\multiprocessing\pool.py", line 233, in _setup_queues
self._inqueue = SimpleQueue()
File "C:\Python27\lib\multiprocessing\queues.py", line 351, in __init__
self._reader, self._writer = Pipe(duplex=False)
File "C:\Python27\lib\multiprocessing\__init__.py", line 107, in Pipe
return Pipe(duplex)
File "C:\Python27\lib\multiprocessing\connection.py", line 223, in Pipe
1, obsize, ibsize, win32.NMPWAIT_WAIT_FOREVER, win32.NULL
WindowsError: [Error 123] The filename, directory name, or volume label syntax is incorrect
This code works fine.
from multiprocessing import Pool
import tempfile as TF
TF.tempdir = "R:/"
def f(x):
return x*x
if __name__ == '__main__':
print("test")
The bizarre thing is that, both times I don't do anything with TF.tempdir, but the one with the Pool doesn't work for some reason.

It is cool it looks like you have a name collision from what I can see in
"C:\Program Files\PYTHON\Lib\multiprocessing\connection.py"
It seems that multipprocessing is using tempfile as well
That behavior should not happen but it looks to me like the problem is in line 66 of connection.py
elif family == 'AF_PIPE':
return tempfile.mktemp(prefix=r'\\.\pipe\pyc-%d-%d-' %
(os.getpid(), _mmap_counter.next()))
I am still poking at this, I looked at globals after importing tempfile and then tempfile as TF, different names exist but now I am wondering about references, and so am trying to figure out if they point to the same thing.

Python multiprocessing on windows with large arrays

I wrote a script on a linux platform using the multiprocessing module of python. When I tried running the program on Windows this was not working directly which I found out is related to the fact how child-processes are generated on Windows. It seems to be crucial that the objects which are used can be pickled.
My main problem is, that I am using large numpy arrays. It seems that with a certain size they are not pickable any more. To break it down to a simple script, I want to do something like that:
### Import modules
import numpy as np
import multiprocessing as mp
number_of_processes = 4
if __name__ == '__main__':
def reverse_np_array(arr):
arr = arr + 1
return arr
a = np.ndarray((200,1024,1280),dtype=np.uint16)
def put_into_queue(_Queue,arr):
_Queue.put(reverse_np_array(arr))
Queue_list = []
Process_list = []
list_of_arrays = []
for i in range(number_of_processes):
Queue_list.append(mp.Queue())
for i in range(number_of_processes):
Process_list.append(mp.Process(target=put_into_queue, args=(Queue_list[i],a)))
for i in range(number_of_processes):
Process_list[i].start()
for i in range(number_of_processes):
list_of_arrays.append(Queue_list[i].get())
for i in range(number_of_processes):
Process_list[i].join()
I get the following error message:
Traceback (most recent call last):
File "Windows_multi.py", line 34, in <module>
Process_list[i].start()
File "C:\Program Files\Anaconda32\lib\multiprocessing\process.py", line 130, i
n start
self._popen = Popen(self)
File "C:\Program Files\Anaconda32\lib\multiprocessing\forking.py", line 277, i
n __init__
dump(process_obj, to_child, HIGHEST_PROTOCOL)
File "C:\Program Files\Anaconda32\lib\multiprocessing\forking.py", line 199, i
n dump
ForkingPickler(file, protocol).dump(obj)
File "C:\Program Files\Anaconda32\lib\pickle.py", line 224, in dump
self.save(obj)
File "C:\Program Files\Anaconda32\lib\pickle.py", line 331, in save
self.save_reduce(obj=obj, *rv)
File "C:\Program Files\Anaconda32\lib\pickle.py", line 419, in save_reduce
save(state)
File "C:\Program Files\Anaconda32\lib\pickle.py", line 286, in save
f(self, obj) # Call unbound method with explicit self
File "C:\Program Files\Anaconda32\lib\pickle.py", line 649, in save_dict
self._batch_setitems(obj.iteritems())
So I am basically creating a large array which I need in all processes to do calculations with this array and return it.
One important thing seems to be to write the definitions of the functions before the statement if __name__ = '__main__':
The whole thing is working if I reduce the array to (50,1024,1280).
However even if 4 processes are started and 4 cores are working, it is slower than writing the code without multiprocessing for one core only (on windows). So I think I have another problem here.
The function in my real program later on is in a cython module.
I am using the anaconda package with python 32-bit since I could not get my cython package compiled with the 64-bit version (I'll ask about that in a different thread).
Any help is welcome!!
Thanks!
Philipp
UPDATE:
First mistake I did was haveing the a "put_into_queue" function definition in the __main__.
Then I introduced shared arrays as suggested, however, uses a lot of memory and the used memory scales with the processes I use (which should of course not be the case).
Any ideas what I am doing wrong here? It seems not to be important where I place the definition of the shared array (in or outside __main__), though, I think it should be in the __main__. Got this from this post: Is shared readonly data copied to different processes for Python multiprocessing?
import numpy as np
import multiprocessing as mp
import ctypes
shared_array_base = mp.Array(ctypes.c_uint, 1280*1024*20)
shared_array = np.ctypeslib.as_array(shared_array_base.get_obj())
#print shared_array
shared_array = shared_array.reshape(20,1024,1280)
number_of_processes = 4
def put_into_queue(_Queue,arr):
_Queue.put(reverse_np_array(arr))
def reverse_np_array(arr):
arr = arr + 1 + np.random.rand()
return arr
if __name__ == '__main__':
#print shared_arra
#a = np.ndarray((50,1024,1280),dtype=np.uint16)
Queue_list = []
Process_list = []
list_of_arrays = []
for i in range(number_of_processes):
Queue_list.append(mp.Queue())
for i in range(number_of_processes):
Process_list.append(mp.Process(target=put_into_queue, args=(Queue_list[i],shared_array)))
for i in range(number_of_processes):
Process_list[i].start()
for i in range(number_of_processes):
list_of_arrays.append(Queue_list[i].get())
for i in range(number_of_processes):
Process_list[i].join()

You didn't include the full traceback; the end is most important. On my 32-bit Python I get the same traceback that finally ends in
File "C:\Python27\lib\pickle.py", line 486, in save_string
self.write(BINSTRING + pack("<i", n) + obj)
MemoryError
MemoryError is the exception and it says you ran out of memory.
64-bit Python would get around this, but sending large amounts of data between processes can easily become a serious bottleneck in multiprocessing.

Error with multiprocessing, atexit and global data

Sorry in advance, this is going to be long ...
Possibly related:
Python Multiprocessing atexit Error "Error in atexit._run_exitfuncs"
Definitely related:
python parallel map (multiprocessing.Pool.map) with global data
Keyboard Interrupts with python's multiprocessing Pool
Here's a "simple" script I hacked together to illustrate my problem...
import time
import multiprocessing as multi
import atexit
cleanup_stuff=multi.Manager().list([])
##################################################
# Some code to allow keyboard interrupts
##################################################
was_interrupted=multi.Manager().list([])
class _interrupt(object):
"""
Toy class to allow retrieval of the interrupt that triggered it's execution
"""
def __init__(self,interrupt):
self.interrupt=interrupt
def interrupt():
was_interrupted.append(1)
def interruptable(func):
"""
decorator to allow functions to be "interruptable" by
a keyboard interrupt when in python's multiprocessing.Pool.map
**Note**, this won't actually cause the Map to be interrupted,
It will merely cause the following functions to be not executed.
"""
def newfunc(*args,**kwargs):
try:
if(not was_interrupted):
return func(*args,**kwargs)
else:
return False
except KeyboardInterrupt as e:
interrupt()
return _interrupt(e) #If we really want to know about the interrupt...
return newfunc
#atexit.register
def cleanup():
for i in cleanup_stuff:
print(i)
return
#interruptable
def func(i):
print(i)
cleanup_stuff.append(i)
time.sleep(float(i)/10.)
return i
#Must wrap func here, otherwise it won't be found in __main__'s dict
#Maybe because it was created dynamically using the decorator?
def wrapper(*args):
return func(*args)
if __name__ == "__main__":
#This is an attempt to use signals -- I also attempted something similar where
#The signals were only caught in the child processes...Or only on the main process...
#
#import signal
#def onSigInt(*args): interrupt()
#signal.signal(signal.SIGINT,onSigInt)
#Try 2 with signals (only catch signal on main process)
#import signal
#def onSigInt(*args): interrupt()
#signal.signal(signal.SIGINT,onSigInt)
#def startup(): signal.signal(signal.SIGINT,signal.SIG_IGN)
#p=multi.Pool(processes=4,initializer=startup)
#Try 3 with signals (only catch signal on child processes)
#import signal
#def onSigInt(*args): interrupt()
#signal.signal(signal.SIGINT,signal.SIG_IGN)
#def startup(): signal.signal(signal.SIGINT,onSigInt)
#p=multi.Pool(processes=4,initializer=startup)
p=multi.Pool(4)
try:
out=p.map(wrapper,range(30))
#out=p.map_async(wrapper,range(30)).get() #This doesn't work either...
#The following lines don't work either
#Effectively trying to roll my own p.map() with p.apply_async
# results=[p.apply_async(wrapper,args=(i,)) for i in range(30)]
# out = [ r.get() for r in results() ]
except KeyboardInterrupt:
print ("Hello!")
out=None
finally:
p.terminate()
p.join()
print (out)
This works just fine if no KeyboardInterrupt is raised. However, if I raise one, the following exception occurs:
10
7
9
12
^CHello!
None
Error in atexit._run_exitfuncs:
Traceback (most recent call last):
File "/usr/lib/python2.6/atexit.py", line 24, in _run_exitfuncs
func(*targs, **kargs)
File "test.py", line 58, in cleanup
for i in cleanup_stuff:
File "<string>", line 2, in __getitem__
File "/usr/lib/python2.6/multiprocessing/managers.py", line 722, in _callmethod
self._connect()
File "/usr/lib/python2.6/multiprocessing/managers.py", line 709, in _connect
conn = self._Client(self._token.address, authkey=self._authkey)
File "/usr/lib/python2.6/multiprocessing/connection.py", line 143, in Client
c = SocketClient(address)
File "/usr/lib/python2.6/multiprocessing/connection.py", line 263, in SocketClient
s.connect(address)
File "<string>", line 1, in connect
error: [Errno 2] No such file or directory
Error in sys.exitfunc:
Traceback (most recent call last):
File "/usr/lib/python2.6/atexit.py", line 24, in _run_exitfuncs
func(*targs, **kargs)
File "test.py", line 58, in cleanup
for i in cleanup_stuff:
File "<string>", line 2, in __getitem__
File "/usr/lib/python2.6/multiprocessing/managers.py", line 722, in _callmethod
self._connect()
File "/usr/lib/python2.6/multiprocessing/managers.py", line 709, in _connect
conn = self._Client(self._token.address, authkey=self._authkey)
File "/usr/lib/python2.6/multiprocessing/connection.py", line 143, in Client
c = SocketClient(address)
File "/usr/lib/python2.6/multiprocessing/connection.py", line 263, in SocketClient
s.connect(address)
File "<string>", line 1, in connect
socket.error: [Errno 2] No such file or directory
Interestingly enough, the code does exit the Pool.map function without calling any of the additional functions ... The problem seems to be that the KeyboardInterrupt isn't handled properly at some point, but it is a little confusing where that is, and why it isn't handled in interruptable. Thanks.
Note, the same problem happens if I use out=p.map_async(wrapper,range(30)).get()
EDIT 1
A little closer ... If I enclose the out=p.map(...) in a try,except,finally clause, it gets rid of the first exception ... the other ones are still raised in atexit however. The code and traceback above have been updated.
EDIT 2
Something else that does not work has been added to the code above as a comment. (Same error). This attempt was inspired by:
http://jessenoller.com/2009/01/08/multiprocessingpool-and-keyboardinterrupt/
EDIT 3
Another failed attempt using signals added to the code above.
EDIT 4
I have figured out how to restructure my code so that the above is no longer necessary. In the (unlikely) event that someone stumbles upon this thread with the same use-case that I had, I will describe my solution ...
Use Case
I have a function which generates temporary files using the tempfile module. I would like those temporary files to be cleaned up when the program exits. My initial attempt was to pack each temporary file name into a list and then delete all the elements of the list with a function registered via atexit.register. The problem is that the updated list was not being updated across multiple processes. This is where I got the idea of using multiprocessing.Manager to manage the list data. Unfortunately, this fails on a KeyboardInterrupt no matter how hard I tried because the communication sockets between processes were broken for some reason. The solution to this problem is simple. Prior to using multiprocessing, set the temporary file directory ... something like tempfile.tempdir=tempfile.mkdtemp() and then register a function to delete the temporary directory. Each of the processes writes to the same temporary directory, so it works. Of course, this solution only works where the shared data is a list of files that needs to be deleted at the end of the program's life.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Profiling Python code that uses multiprocessing? - python

Related

Python3 multiprocessing Pool and Lock error

pyserial with multiprocessing gives me a ctype error

python tempfile and multiprocessing pool error

Python multiprocessing on windows with large arrays

Error with multiprocessing, atexit and global data

Categories

Resources