Python Multiprocessing: Example not raising a PicklingError

Python Multiprocessing: Example not raising a PicklingError - python

Can anyone explain to me why the last example (Example 3. Multiprocess wrapper for Net-SNMP) in the following page: https://www.ibm.com/developerworks/aix/library/au-multiprocessing/ does not raise a PicklingError ?
I have tried it with my own bound method that updates and returns an instance attribute(similar to the example which updates and returns an attribute of the instance) and it raises the following error:
Traceback (most recent call last):
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/multiprocessing/queues.py", line 268, in _feed
send(obj)
PicklingError: Can't pickle <type 'thread.lock'>: attribute lookup thread.lock failed
Here is my code:
from multiprocessing import Process, Queue
import requests
class MyClass(object):
def do_request(self, url):
try:
self.response = requests.get(url)
except:
self.response = None
return self.response
def make_request(url):
s = MyClass()
return s.do_request(url)
# Function run by worker processes
def worker(input, output):
for func in iter(input.get, 'STOP'):
result = make_request(func)
output.put(result)
def main():
"""Runs everything"""
#clients
urls = ['http://www.google.com', 'http://www.amazon.com']
NUMBER_OF_PROCESSES = len(urls)
# Create queues
task_queue = Queue()
done_queue = Queue()
#submit tasks
for url in urls:
task_queue.put(url)
#Start worker processes
for i in range(NUMBER_OF_PROCESSES):
Process(target=worker, args=(task_queue, done_queue)).start()
# Get and print results
print 'Unordered results:'
for i in range(len(urls)):
print '\t', done_queue.get()
# Tell child processes to stop
for i in range(NUMBER_OF_PROCESSES):
task_queue.put('STOP')
print "Stopping Process #%s" % i
if __name__ == "__main__":
main()

The problem is that the return from requests.get() is not a pickable object. You'll need to extract the information you want and return that to the parent object. Personally, I like to keep to simple types plus lists and dicts for this type of thing - it keeps the number of bad things that can happen to a minimum.
Here's an experiment that threw a much more messy exception on my linux + python 2.7 machine, but gives you an idea of the problem:
>>> import requests
>>> import pickle
>>> r=requests.get('http://google.com')
>>> pickle.dumps(r)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.7/pickle.py", line 1374, in dumps
Pickler(file, protocol).dump(obj)
File "/usr/lib/python2.7/pickle.py", line 224, in dump
self.save(obj)
... many lines removed
File "/usr/lib/python2.7/copy_reg.py", line 77, in _reduce_ex
raise TypeError("a class that defines __slots__ without "
TypeError: a class that defines __slots__ without defining __getstate__ cannot be pickled
You can test whether I'm right or not by replacing self.response = requests.get(url) with self.response = "hello".

Related

Why does multiprocessing not working when opening a file?

As I am trying out the multiprocessing pool module, I noticed that it does not work when I am loading / opening any kind of file. The code below works as expected. When I uncomment lines 8-9, the script skips the pool.apply_async method, and loopingTest never runs.
import time
from multiprocessing import Pool
class MultiClass:
def __init__(self):
file = 'test.txt'
# with open(file, 'r') as f: # This is the culprit
# self.d = f
self.n = 50000000
self.cases = ['1st time', '2nd time']
self.multiProc(self.cases)
print("It's done")
def loopingTest(self, cases):
print(f"looping start for {cases}")
n = self.n
while n > 0:
n -= 1
print(f"looping done for {cases}")
def multiProc(self, cases):
test = False
pool = Pool(processes=2)
if not test:
for i in cases:
pool.apply_async(self.loopingTest, (i,))
pool.close()
pool.join()
if __name__ == '__main__':
start = time.time()
w = MultiClass()
end = time.time()
print(f'Script finished in {end - start} seconds')

You see this behavior because calling apply_async fails when you save the file descriptor (self.d) to your instance. When you call apply_async(self.loopingTest, ...), Python needs to pickle self.loopingTest to send it to the worker process, which also requires pickling self. When you have the open file descriptor saved as a property of self, the pickling fails, because file descriptors can't be pickled. You'll see this for yourself if you use apply instead of apply_async in your sample code. You'll get an error like this:
Traceback (most recent call last):
File "a.py", line 36, in <module>
w = MultiClass()
File "a.py", line 12, in __init__
self.multiProc(self.cases)
File "a.py", line 28, in multiProc
out.get()
File "/usr/lib/python3.6/multiprocessing/pool.py", line 644, in get
raise self._value
File "/usr/lib/python3.6/multiprocessing/pool.py", line 424, in _handle_tasks
put(task)
File "/usr/lib/python3.6/multiprocessing/connection.py", line 206, in send
self._send_bytes(_ForkingPickler.dumps(obj))
File "/usr/lib/python3.6/multiprocessing/reduction.py", line 51, in dumps
cls(buf, protocol).dump(obj)
TypeError: cannot serialize '_io.TextIOWrapper' object
You need to change your code either avoiding saving the file descriptor to self, only create it in the worker method (if that's where you need to use it), or by using the tools Python provides to control the pickle/unpickle process for your class. Depending on the use-case, you can also turn the method you're passing to apply_async into a top-level function, so that self doesn't need to be pickled at all.

Python3 multiprocessing Pool and Lock error

i have have code similaire to what shown below, So please i want the proper solution to run this without errors, i got shared memory error and also open gui many times,
mainapp = ProcessFiles()
p = multiprocessing.Pool()
p.map(mainapp.getPdfInfo, Files_list)
p.close()
class ProcessFiles:
def __init__():
self.lock = multiprocessing.Lock()
def getPdfInfo(file):
#READ FILES DATA AND DO SOME STUFFS
self.lock.aquere()
#INSERT DATA TO DATABASE
self.lock.release()
and this the error msg
TypeError: can't pickle sqlite3.Connection objects
Alos tried with multiprocessing.Manager() and also got errors, Code shown below
mainapp = ProcessFiles()
p = multiprocessing.Pool()
p.map(mainapp.getPdfInfo, Files_list)
p.close()
class ProcessFiles:
def __init__():
m = multiprocessing.Manager()
self.lock = m.Lock()
def getPdfInfo(file):
#READ FILES DATA AND DO SOME STUFFS
self.lock.acquire()
#INSERT DATA TO DATABASE
self.lock.release()
and thats the error msg
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "C:\Python37\lib\multiprocessing\spawn.py", line 105, in spawn_main
exitcode = _main(fd)
++++++++++++++++++++++++++++++
_check_not_importing_main()
File "C:\Python37\lib\multiprocessing\spawn.py", line 136, in
_check_not_importing_main
is not going to be frozen to produce an executable.''')
RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.
This probably means that you are not using fork to start your
child processes and you have forgotten to use the proper idiom
in the main module:
if __name__ == '__main__':
freeze_support()
...
The "freeze_support()" line can be omitted if the program
is not going to be frozen to produce an executable.

Error group argument must be None for now in multiprocessing.pool

Below is my python script.
import multiprocessing
# We must import this explicitly, it is not imported by the top-level
# multiprocessing module.
import multiprocessing.pool
import time
from random import randint
class NoDaemonProcess(multiprocessing.Process):
# make 'daemon' attribute always return False
def _get_daemon(self):
return False
def _set_daemon(self, value):
pass
daemon = property(_get_daemon, _set_daemon)
# We sub-class multiprocessing.pool.Pool instead of multiprocessing.Pool
# because the latter is only a wrapper function, not a proper class.
class MyPool(multiprocessing.pool.Pool):
Process = NoDaemonProcess
def sleepawhile(t):
print("Sleeping %i seconds..." % t)
time.sleep(t)
return t
def work(num_procs):
print("Creating %i (daemon) workers and jobs in child." % num_procs)
pool = multiprocessing.Pool(num_procs)
result = pool.map(sleepawhile,
[randint(1, 5) for x in range(num_procs)])
# The following is not really needed, since the (daemon) workers of the
# child's pool are killed when the child is terminated, but it's good
# practice to cleanup after ourselves anyway.
pool.close()
pool.join()
return result
def test():
print("Creating 5 (non-daemon) workers and jobs in main process.")
pool = MyPool(20)
result = pool.map(work, [randint(1, 5) for x in range(5)])
pool.close()
pool.join()
print(result)
if __name__ == '__main__':
test()
This is running in ubuntu server and i'm using python 3.6.7
I had this working properly after apt-get upgrade Im getting error as
group argument must be None for now
What might be the error that I'm facing.
Should i change the python version. Should I roll back the changes after upgrading.
EDIT 1
Stacktrace exception:-
Traceback (most recent call last):
File "/src/mainapp.py", line 104, in bulkfun
p = MyPool(20)
File "/usr/lib/python3.6/multiprocessing/pool.py", line 175, in __init__
self._repopulate_pool()
File "/usr/lib/python3.6/multiprocessing/pool.py", line 236, in _repopulate_pool
self._wrap_exception)
File "/usr/lib/python3.6/multiprocessing/pool.py", line 250, in _repopulate_pool_static
wrap_exception)
File "/usr/lib/python3.6/multiprocessing/process.py", line 73, in __init__
assert group is None, 'group argument must be None for now'
AssertionError: group argument must be None for now
EDIT 2
The code works for python2.7, python3.5
But if i run with python 3.6.7 i got the error as below.
Creating 5 (non-daemon) workers and jobs in main process.
Traceback (most recent call last):
File "multi.py", line 52, in <module>
test()
File "multi.py", line 43, in test
pool = MyPool(5)
File "/usr/lib/python3.6/multiprocessing/pool.py", line 175, in __init__
self._repopulate_pool()
File "/usr/lib/python3.6/multiprocessing/pool.py", line 236, in _repopulate_pool
self._wrap_exception)
File "/usr/lib/python3.6/multiprocessing/pool.py", line 250, in _repopulate_pool_static
wrap_exception)
File "/usr/lib/python3.6/multiprocessing/process.py", line 73, in __init__
assert group is None, 'group argument must be None for now'
AssertionError: group argument must be None for now

I came across to this issue while upgrading Travis distribution from 14.04 to 16.04 and python 3.6 started to fail. I have found a solution to this problem as it was a fix to another package - FIX: Python 2.7-3.7.1 compatible NonDaemonPool
class NonDaemonPool(multiprocessing.pool.Pool):
def Process(self, *args, **kwds):
proc = super(NonDaemonPool, self).Process(*args, **kwds)
class NonDaemonProcess(proc.__class__):
"""Monkey-patch process to ensure it is never daemonized"""
#property
def daemon(self):
return False
#daemon.setter
def daemon(self, val):
pass
proc.__class__ = NonDaemonProcess
return proc

same here.
This code worked in my case (python 3.6.7). (https://stackoverflow.com/a/53180921/10742388)
class NoDaemonProcess(multiprocessing.Process):
#property
def daemon(self):
return False
#daemon.setter
def daemon(self, value):
pass
class NoDaemonContext(type(multiprocessing.get_context())):
Process = NoDaemonProcess
# We sub-class multiprocessing.pool.Pool instead of multiprocessing.Pool
# because the latter is only a wrapper function, not a proper class.
class MyPool(multiprocessing.pool.Pool):
def __init__(self, *args, **kwargs):
kwargs['context'] = NoDaemonContext()
super(MyPool, self).__init__(*args, **kwargs)
I think this problem comes from the change of process.py (https://github.com/python/cpython/blob/8ca0fa9d2f4de6e69f0902790432e0ab2f37ba68/Lib/multiprocessing/process.py#L189)

Why would it throws "'module' object has no attribute XXX" error when I call on apply_async from multiprocessing.Pool?

The code is as below. When I copy-and-paste it in my cmd prompt, it throws 'module' object has no attribute 'func', but when I save it as a .py file and execute python test.py, it just works fine.
import multiprocessing
import time
def func(msg):
for i in xrange(3):
print msg
time.sleep(1)
if __name__ == '__main__':
pool = multiprocessing.Pool(processes=4)
for i in xrange(5):
msg = "hello %d" %(i)
pool.apply_async(func, (msg, ))
pool.close()
pool.join()
print "Sub-process(es) done."
Could anyone give me an explanation on the difference between in prompt and in file when running a python code? Thanks a lot!

This is happening because on Windows, func needs to be pickled and sent to the child process via IPC. In order for the child to unpickle func, it needs to be able to import it from the parent's __main__ module. When this happens in a normal Python script, the child can re-import your script, and __main__ will contain all the functions declared at the top-level of your script, so it works fine. However, in the interactive interpreter, functions you've defined while in the interpreter can't simply be re-imported from a file like in a normal script, so they will not be in __main__ in the child. This is more clear if you use multiprocessing.Process directly to recreate the issue:
>>> def f():
... print "HI"
...
>>> import multiprocessing
>>> p = multiprocessing.Process(target=f)
>>> p.start()
>>> Traceback (most recent call last):
File "<string>", line 1, in <module>
File "C:\python27\lib\multiprocessing\forking.py", line 381, in main
self = load(from_parent)
File "C:\python27\lib\pickle.py", line 1378, in load
return Unpickler(file).load()
File "C:\python27\lib\pickle.py", line 858, in load
dispatch[key](self)
File "C:\python27\lib\pickle.py", line 1090, in load_global
klass = self.find_class(module, name)
File "C:\python27\lib\pickle.py", line 1126, in find_class
klass = getattr(mod, name)
AttributeError: 'module' object has no attribute 'f'
This way, it's more clear that pickle can't find the module. If you add some tracing to pickle.py you can see that 'module' is referring to __main__:
def load_global(self):
module = self.readline()[:-1]
name = self.readline()[:-1]
print("module {} name {}".format(module, name)) # I added this.
klass = self.find_class(module, name)
self.append(klass)
Rrerunning the same code again with that extra print statement yields this:
module multiprocessing.process name Process
module __main__ name f
< same traceback as before>
It's worth noting that this example actually works fine on Posix platforms, because os.fork() is used to spawn the child processes, which means that any function defined prior to the Pool being created will be available in the child's __main__ module. So, while the above example will work, this one will still fail, because the worker function is defined after creating the Pool (which means after os.fork() is called):
>>> import multiprocessing
>>> p = multiprocessing.Pool(2)
>>> def f(a):
... print(a)
...
>>> p.apply(f, "hi")
Process PoolWorker-1:
Traceback (most recent call last):
File "/usr/lib64/python2.6/multiprocessing/process.py", line 231, in _bootstrap
self.run()
File "/usr/lib64/python2.6/multiprocessing/process.py", line 88, in run
self._target(*self._args, **self._kwargs)
File "/usr/lib64/python2.6/multiprocessing/pool.py", line 57, in worker
task = get()
File "/usr/lib64/python2.6/multiprocessing/queues.py", line 339, in get
return recv()
AttributeError: 'module' object has no attribute 'f'

Storing a process object in multiprocessing Manager.dict()

I have created this sample program to generalize the issue i am facing
import multiprocessing
from multiprocessing import Manager
def f (_print):
print _print
manager = multiprocessing.Manager()
dict = manager.dict()
dict['process_obj'] = multiprocessing.current_process()
print dict
if __name__ == '__main__':
process = multiprocessing.Process(target=f, args= ('hello function', ))
process.start()
process.join()
So how do I store a process object in multiprocessing Manager.dict()?

I assume you're talking about getting this error:
hello function
Process Process-1:
Traceback (most recent call last):
File "/usr/local/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/usr/local/lib/python2.7/multiprocessing/process.py", line 114, in run
self._target(*self._args, **self._kwargs)
File "mp2.py", line 8, in f
dict['process_obj'] = multiprocessing.current_process()
File "<string>", line 2, in __setitem__
File "/usr/local/lib/python2.7/multiprocessing/managers.py", line 758, in _callmethod
conn.send((self._id, methodname, args, kwds))
PicklingError: Can't pickle <type 'instancemethod'>: attribute lookup __builtin__.instancemethod failed
(it's generally a good idea to include "what I got" and "what I expected to get instead" in the question).
The fundamental problem here is that multiprocessing.current_process() returns an instance method. Instance methods don't pickle properly, and multiprocessing has to save (pickle) and load (unpickle) shared data items to communicate their values from one process to another. See, e.g., Can't pickle <type 'instancemethod'> when using python's multiprocessing Pool.map() and Overcoming Python's limitations regarding instance methods. Note in particular one of the answers in the second: it might be better to figure out some state to send/share, rather than an entire instance. For instance, if the ident of a process suffices, you can do this:
dict['process_obj'] = multiprocessing.current_process().ident
which works fine.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python Multiprocessing: Example not raising a PicklingError - python

Related

Why does multiprocessing not working when opening a file?

Python3 multiprocessing Pool and Lock error

Error group argument must be None for now in multiprocessing.pool

Why would it throws "'module' object has no attribute XXX" error when I call on apply_async from multiprocessing.Pool?

Storing a process object in multiprocessing Manager.dict()

Categories

Resources