I am trying some simple programs which involve multiprocessing features in Python.
The code is given below:
from multiprocessing import Process, Queue
def print_square(i):
print i*i
if __name__ == '__main__':
output = Queue()
processes = [Process(target=print_square,args=(i,)) for i in range(5)]
for p in processes:
p.start()
for p in processes:
p.join()
However, this gives an error message AttributeError: 'module' object has no attribute 'heappush'. The complete output upon executing the script is given below:
Traceback (most recent call last):
File "parallel_3.py", line 15, in
output = Queue()
File "C:\Users\abc\AppData\Local\Continuum\Anaconda2\lib\multi
processing\__init__.py", line 217, in Queue
from multiprocessing.queues import Queue
File "C:\Users\abc\AppData\Local\Continuum\Anaconda2\lib\multi
processing\queues.py", line 45, in
from Queue import Empty, Full
File "C:\Users\abc\AppData\Local\Continuum\Anaconda2\lib\Queue
.py", line 212, in
class PriorityQueue(Queue):
File "C:\Users\abc\AppData\Local\Continuum\Anaconda2\lib\Queue
.py", line 224, in PriorityQueue
def _put(self, item, heappush=heapq.heappush):
AttributeError: 'module' object has no attribute 'heappush'
The code compiles fine if the output=Queue() statement is commented.
What could be possibly causing this error ?
Your filename should be the package name. Change to another filename such as heap and it will work.
Related
I'm trying to add spawned subprocesses to a queue so that only one can execute at a time. I don't want to wait for the process to execute with each iteration of the for loop because there is some code at the beginning of the loop that can run in parallel with the processes.
Example code:
from multiprocessing import Queue
import subprocess
q = Queue()
hrs = range(0,12)
for hr in hrs:
print(hr)
# There will be other code here that takes some time to run
p = subprocess.Popen(['python', 'test.py', '--hr={}'.format(hr)])
q.put(p)
This results in:
Traceback (most recent call last):
File "/home/kschneider/anaconda3/envs/ewall/lib/python3.8/multiprocessing/queues.py", line 239, in _feed
obj = _ForkingPickler.dumps(obj)
File "/home/kschneider/anaconda3/envs/ewall/lib/python3.8/multiprocessing/reduction.py", line 51, in dumps
cls(buf, protocol).dump(obj)
TypeError: cannot pickle '_thread.lock' object
Is there a different way to set this up that will not result in the thread lock errors?
I'd like to send data defined in a module that was dynamically loaded with imp via a Manager().Queue() on Windows. Is this possible? Here is a minimal testcase to demonstrate what I mean:
import imp
import sys
from multiprocessing import Manager
if __name__ == '__main__':
s = """
def payload():
print("It works!")
"""
mod = imp.new_module('testcase')
exec s in mod.__dict__
sys.modules['testcase'] = mod
payload = mod.payload
payload() # It works!
m = Manager()
queue = m.Queue()
queue.put(payload) # AttributeError: 'module' object has no attribute 'payload'
Note: The use of imp.new_module + exec is just to get the testcase in a single file. The same AttributeError is raised when using imp.load_source.
Note2: This testcase leaves out all the Pool/worker code as the error happens before then.
Here is the output with full traceback from running the above script:
It works!
Traceback (most recent call last):
File "testcase.py", line 23, in <module>
queue.put(payload) # AttributeError: 'module' object has no attribute 'payload'
File "<string>", line 2, in put
File "c:\Users\Andrew\dev\python\lib\multiprocessing\managers.py", line 774, in _callmethod
raise convert_to_error(kind, result)
multiprocessing.managers.RemoteError:
---------------------------------------------------------------------------
Traceback (most recent call last):
File "c:\Users\Andrew\dev\python\lib\multiprocessing\managers.py", line 240, in serve_client
request = recv()
AttributeError: 'module' object has no attribute 'payload'
---------------------------------------------------------------------------
Thanks!
The code is as below. When I copy-and-paste it in my cmd prompt, it throws 'module' object has no attribute 'func', but when I save it as a .py file and execute python test.py, it just works fine.
import multiprocessing
import time
def func(msg):
for i in xrange(3):
print msg
time.sleep(1)
if __name__ == '__main__':
pool = multiprocessing.Pool(processes=4)
for i in xrange(5):
msg = "hello %d" %(i)
pool.apply_async(func, (msg, ))
pool.close()
pool.join()
print "Sub-process(es) done."
Could anyone give me an explanation on the difference between in prompt and in file when running a python code? Thanks a lot!
This is happening because on Windows, func needs to be pickled and sent to the child process via IPC. In order for the child to unpickle func, it needs to be able to import it from the parent's __main__ module. When this happens in a normal Python script, the child can re-import your script, and __main__ will contain all the functions declared at the top-level of your script, so it works fine. However, in the interactive interpreter, functions you've defined while in the interpreter can't simply be re-imported from a file like in a normal script, so they will not be in __main__ in the child. This is more clear if you use multiprocessing.Process directly to recreate the issue:
>>> def f():
... print "HI"
...
>>> import multiprocessing
>>> p = multiprocessing.Process(target=f)
>>> p.start()
>>> Traceback (most recent call last):
File "<string>", line 1, in <module>
File "C:\python27\lib\multiprocessing\forking.py", line 381, in main
self = load(from_parent)
File "C:\python27\lib\pickle.py", line 1378, in load
return Unpickler(file).load()
File "C:\python27\lib\pickle.py", line 858, in load
dispatch[key](self)
File "C:\python27\lib\pickle.py", line 1090, in load_global
klass = self.find_class(module, name)
File "C:\python27\lib\pickle.py", line 1126, in find_class
klass = getattr(mod, name)
AttributeError: 'module' object has no attribute 'f'
This way, it's more clear that pickle can't find the module. If you add some tracing to pickle.py you can see that 'module' is referring to __main__:
def load_global(self):
module = self.readline()[:-1]
name = self.readline()[:-1]
print("module {} name {}".format(module, name)) # I added this.
klass = self.find_class(module, name)
self.append(klass)
Rrerunning the same code again with that extra print statement yields this:
module multiprocessing.process name Process
module __main__ name f
< same traceback as before>
It's worth noting that this example actually works fine on Posix platforms, because os.fork() is used to spawn the child processes, which means that any function defined prior to the Pool being created will be available in the child's __main__ module. So, while the above example will work, this one will still fail, because the worker function is defined after creating the Pool (which means after os.fork() is called):
>>> import multiprocessing
>>> p = multiprocessing.Pool(2)
>>> def f(a):
... print(a)
...
>>> p.apply(f, "hi")
Process PoolWorker-1:
Traceback (most recent call last):
File "/usr/lib64/python2.6/multiprocessing/process.py", line 231, in _bootstrap
self.run()
File "/usr/lib64/python2.6/multiprocessing/process.py", line 88, in run
self._target(*self._args, **self._kwargs)
File "/usr/lib64/python2.6/multiprocessing/pool.py", line 57, in worker
task = get()
File "/usr/lib64/python2.6/multiprocessing/queues.py", line 339, in get
return recv()
AttributeError: 'module' object has no attribute 'f'
Can anyone explain to me why the last example (Example 3. Multiprocess wrapper for Net-SNMP) in the following page: https://www.ibm.com/developerworks/aix/library/au-multiprocessing/ does not raise a PicklingError ?
I have tried it with my own bound method that updates and returns an instance attribute(similar to the example which updates and returns an attribute of the instance) and it raises the following error:
Traceback (most recent call last):
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/multiprocessing/queues.py", line 268, in _feed
send(obj)
PicklingError: Can't pickle <type 'thread.lock'>: attribute lookup thread.lock failed
Here is my code:
from multiprocessing import Process, Queue
import requests
class MyClass(object):
def do_request(self, url):
try:
self.response = requests.get(url)
except:
self.response = None
return self.response
def make_request(url):
s = MyClass()
return s.do_request(url)
# Function run by worker processes
def worker(input, output):
for func in iter(input.get, 'STOP'):
result = make_request(func)
output.put(result)
def main():
"""Runs everything"""
#clients
urls = ['http://www.google.com', 'http://www.amazon.com']
NUMBER_OF_PROCESSES = len(urls)
# Create queues
task_queue = Queue()
done_queue = Queue()
#submit tasks
for url in urls:
task_queue.put(url)
#Start worker processes
for i in range(NUMBER_OF_PROCESSES):
Process(target=worker, args=(task_queue, done_queue)).start()
# Get and print results
print 'Unordered results:'
for i in range(len(urls)):
print '\t', done_queue.get()
# Tell child processes to stop
for i in range(NUMBER_OF_PROCESSES):
task_queue.put('STOP')
print "Stopping Process #%s" % i
if __name__ == "__main__":
main()
The problem is that the return from requests.get() is not a pickable object. You'll need to extract the information you want and return that to the parent object. Personally, I like to keep to simple types plus lists and dicts for this type of thing - it keeps the number of bad things that can happen to a minimum.
Here's an experiment that threw a much more messy exception on my linux + python 2.7 machine, but gives you an idea of the problem:
>>> import requests
>>> import pickle
>>> r=requests.get('http://google.com')
>>> pickle.dumps(r)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.7/pickle.py", line 1374, in dumps
Pickler(file, protocol).dump(obj)
File "/usr/lib/python2.7/pickle.py", line 224, in dump
self.save(obj)
... many lines removed
File "/usr/lib/python2.7/copy_reg.py", line 77, in _reduce_ex
raise TypeError("a class that defines __slots__ without "
TypeError: a class that defines __slots__ without defining __getstate__ cannot be pickled
You can test whether I'm right or not by replacing self.response = requests.get(url) with self.response = "hello".
I have created this sample program to generalize the issue i am facing
import multiprocessing
from multiprocessing import Manager
def f (_print):
print _print
manager = multiprocessing.Manager()
dict = manager.dict()
dict['process_obj'] = multiprocessing.current_process()
print dict
if __name__ == '__main__':
process = multiprocessing.Process(target=f, args= ('hello function', ))
process.start()
process.join()
So how do I store a process object in multiprocessing Manager.dict()?
I assume you're talking about getting this error:
hello function
Process Process-1:
Traceback (most recent call last):
File "/usr/local/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/usr/local/lib/python2.7/multiprocessing/process.py", line 114, in run
self._target(*self._args, **self._kwargs)
File "mp2.py", line 8, in f
dict['process_obj'] = multiprocessing.current_process()
File "<string>", line 2, in __setitem__
File "/usr/local/lib/python2.7/multiprocessing/managers.py", line 758, in _callmethod
conn.send((self._id, methodname, args, kwds))
PicklingError: Can't pickle <type 'instancemethod'>: attribute lookup __builtin__.instancemethod failed
(it's generally a good idea to include "what I got" and "what I expected to get instead" in the question).
The fundamental problem here is that multiprocessing.current_process() returns an instance method. Instance methods don't pickle properly, and multiprocessing has to save (pickle) and load (unpickle) shared data items to communicate their values from one process to another. See, e.g., Can't pickle <type 'instancemethod'> when using python's multiprocessing Pool.map() and Overcoming Python's limitations regarding instance methods. Note in particular one of the answers in the second: it might be better to figure out some state to send/share, rather than an entire instance. For instance, if the ident of a process suffices, you can do this:
dict['process_obj'] = multiprocessing.current_process().ident
which works fine.