This is a follow-up question for the question I asked here. I tried to parallelize my code as follows:
import concurrent.futures as futures
class A(object):
def __init__(self, q):
self.p = q
def add(self, num):
r = 0
for _ in xrange(10000):
r += num
return r
num_instances = 5
instances = []
for i in xrange(num_instances):
instances.append(A(i))
n = 20
# Create a pool of processes. By default, one is created for each CPU in your machine.
results = []
pool = futures.ProcessPoolExecutor(max_workers=num_instances)
for inst in instances:
future = pool.submit(inst.add, n)
results.append(future.result())
pool.join()
print(results)
But, I got this error:
Traceback (most recent call last): File
"/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/multiprocessing/queues.py",
line 268, in _feed
send(obj) PicklingError: Can't pickle : attribute lookup builtin.instancemethod failed
Any idea why I get this error? I know we can use map function to assign jobs, but I intentionally don't want to do that.
Related
I want to start by stating that I am aware that this error message was posted multiple times. But I cannot seem to understand how those posts apply to me. So I want to try my luck:
I have Dataframe "df" and I am trying to perform a parallel processing of subsets of that dataframe:
for i in range(1, 2):
pool = ThreadPool(processes=4)
async_result = pool.apply_async(helper.Helper.transform(df.copy(), i))
lst.append(async_result)
results = []
for item in lst:
currentitem = item.get()
results.append(currentitem)
Helper Method:
#staticmethod
def transform(df, i):
return df
So I usualle code in Java and for a class I need to do some stuff in python. I just dont understand why in this case I get the error:
Traceback (most recent call last):
File "C:/Users/Barry/file.py", line 28, in <module>
currentitem = item.get()
File "C:\Users\Barry\AppData\Local\Programs\Python\Python38-32\lib\multiprocessing\pool.py", line 768, in get
raise self._value
File "C:\Users\Barry\AppData\Local\Programs\Python\Python38-32\lib\multiprocessing\pool.py", line 125, in worker
result = (True, func(*args, **kwds))
TypeError: 'DataFrame' object is not callable
A print in the thread function or before creating the thread results in proper output.
The issue is with the line:
async_result = pool.apply_async(helper.Helper.transform(df.copy(), i))
The catch - you're calling the function 'transform' before passing it to apply_async. As a result, apply async receives a data frame, "thinks" it's a function, and tries to call it asynchronously. The result is the exception you're seeing, and this result is saved as part of the AsyncResult object.
To fix it just change this line to:
async_result = pool.apply_async(helper.Helper.transform, (df.copy(), i))
Note that apply_async gets two arguments - the function and the parameters to the function.
I've encountered an error that I cannot explain while try to retrieve the results of futures submitted to process pool. I've stored the future objects in a list, and my best guess is that the future object reference is being deleted somehow so that list comprehension fails.
The error is at results = [j.result() for j in jobs] in async_jobs below. The traceback,
in <listcomp>
results = [j.result() for j in jobs]
File "lib/python3.6/concurrent/futures/_base.py", line 405, in result
return self.__get_result()
File "lib/python3.6/concurrent/futures/_base.py", line 357, in __get_result
raise self._exception
IndexError: list index out of range
non-MVCE code
def _job(*args, **kwargs):
"""Does work with thread pool and returns True"""
def _thread_job(*args,**kwargs):
"""Can define here because we are using threading and don't need to pickle"""
...
return None
with futures.ThreadPoolExecutor(max_workers=4) as t_executor:
jobs = []
for i in range(...):
f = t_executor.submit(_thread_job, ..., ...)
jobs.append(f)
results = [j.results() for j in jobs]
return True
def async_jobs():
with futures.ProcessPoolExecutor(max_workers=8) as p_executor:
jobs = []
for i in range(...):
f = p_executor.submit(_job, ..., ...)
jobs.append(f)
results = [j.result() for j in jobs]
if __name__=='__main__':
async_jobs()
Win 7, x64, Python 2.7.12
In the following code I am setting off some pool processes to do a trivial multiplication via the multiprocessing.Pool.map() method. The output data is collected in List_1.
NOTE: this is a stripped down simplification of my actual code. There are multiple lists involved in the real application, all huge.
import multiprocessing
import numpy as np
def createLists(branches):
firstList = branches[:] * node
return firstList
def init_process(lNodes):
global node
node = lNodes
print 'Starting', multiprocessing.current_process().name
if __name__ == '__main__':
mgr = multiprocessing.Manager()
nodes = mgr.list()
pool_size = multiprocessing.cpu_count()
branches = [i for i in range(1, 21)]
lNodes = 10
splitBranches = np.array_split(branches, int(len(branches)/pool_size))
pool = multiprocessing.Pool(processes=pool_size, initializer=init_process, initargs=[lNodes])
myList_1 = pool.map(createLists, splitBranches)
pool.close()
pool.join()
I now add an extra calculation to createLists() & try to pass back both lists.
import multiprocessing
import numpy as np
def createLists(branches):
firstList = branches[:] * node
secondList = branches[:] * node * 2
return firstList, secondList
def init_process(lNodes):
global node
node = lNodes
print 'Starting', multiprocessing.current_process().name
if __name__ == '__main__':
mgr = multiprocessing.Manager()
nodes = mgr.list()
pool_size = multiprocessing.cpu_count()
branches = [i for i in range(1, 21)]
lNodes = 10
splitBranches = np.array_split(branches, int(len(branches)/pool_size))
pool = multiprocessing.Pool(processes=pool_size, initializer=init_process, initargs=[lNodes])
myList_1, myList_2 = pool.map(createLists, splitBranches)
pool.close()
pool.join()
This raises the follow error & traceback..
Traceback (most recent call last):
File "<ipython-input-6-ff188034c708>", line 1, in <module>
runfile('C:/Users/nr16508/Local Documents/Inter Trab Angle/Parallel/scratchpad.py', wdir='C:/Users/nr16508/Local Documents/Inter Trab Angle/Parallel')
File "C:\Users\nr16508\AppData\Local\Continuum\Anaconda2\lib\site-packages\spyder\utils\site\sitecustomize.py", line 866, in runfile
execfile(filename, namespace)
File "C:\Users\nr16508\AppData\Local\Continuum\Anaconda2\lib\site-packages\spyder\utils\site\sitecustomize.py", line 87, in execfile
exec(compile(scripttext, filename, 'exec'), glob, loc)
File "C:/Users/nr16508/Local Documents/Inter Trab Angle/Parallel/scratchpad.py", line 36, in <module>
myList_1, myList_2 = pool.map(createLists, splitBranches)
ValueError: too many values to unpack
When I tried to put both list into one to pass back ie...
return [firstList, secondList]
......
myList = pool.map(createLists, splitBranches)
...the output becomes too jumbled for further processing.
Is there an method of collecting more than one list from pooled processes?
This question has nothing to do with multiprocessing or threadpooling. It is simply about how to unzip lists, which can be done with the standard zip(*...) idiom.
myList_1, myList_2 = zip(*pool.map(createLists, splitBranches))
How can I call a worker from a worker? This seems to be a simple representation of the puzzle I'm trying to solve:
import time
from circuits import BaseComponent, Worker, Debugger, task, handler
class App(BaseComponent):
def factorial(self, n):
time.sleep(1)
if n > 1:
nn = yield self.call(task(self.factorial, n-1))
return n * nn.value
else:
return 1
#handler("started")
def started(self, *args):
Worker().register(self)
rv = yield self.call(task(self.factorial, 5))
print(rv.value)
self.stop()
(App() + Debugger()).run()
Here's the error output:
ERROR (<task[*] (<bound method App.factorial of <App/* 26821:MainThread (queued=1) [R]>>, 5 )>) (<class 'AttributeError'>): AttributeError("'generator' object has no attribute 'task_event'",)
Traceback (most recent call last):
File "/usr/lib/python3.4/site-packages/circuits/core/manager.py", line 841, in processTask
task_state.task_event = event
AttributeError: 'generator' object has no attribute 'task_event'
It also doesn't terminate because it failed before the stop() call.
short short version:
I am having trouble parallelizing code which uses instance methods.
Longer version:
This python code produces the error:
Error
Traceback (most recent call last):
File "/Users/gilzellner/dev/git/3.2.1-build/cloudify-system-tests/cosmo_tester/test_suites/stress_test_openstack/test_file.py", line 24, in test
self.pool.map(self.f, [self, url])
File "/Users/gilzellner/.virtualenvs/3.2.1-build/lib/python2.7/site-packages/pathos/multiprocessing.py", line 131, in map
return _pool.map(star(f), zip(*args)) # chunksize
File "/Users/gilzellner/.virtualenvs/3.2.1-build/lib/python2.7/site-packages/multiprocess/pool.py", line 251, in map
return self.map_async(func, iterable, chunksize).get()
File "/Users/gilzellner/.virtualenvs/3.2.1-build/lib/python2.7/site-packages/multiprocess/pool.py", line 567, in get
raise self._value
AttributeError: 'Test' object has no attribute 'get_type'
This is a simplified version of a real problem I have.
import urllib2
from time import sleep
from os import getpid
import unittest
from pathos.multiprocessing import ProcessingPool as Pool
class Test(unittest.TestCase):
def f(self, x):
print urllib2.urlopen(x).read()
print getpid()
return
def g(self, y, z):
print y
print z
return
def test(self):
url = "http://nba.com"
self.pool = Pool(processes=1)
for x in range(0, 3):
self.pool.map(self.f, [self, url])
self.pool.map(self.g, [self, url, 1])
sleep(10)
I am using pathos.multiprocessing due to the recommendation here:
Multiprocessing: Pool and pickle Error -- Pickling Error: Can't pickle <type 'instancemethod'>: attribute lookup __builtin__.instancemethod failed
Before using pathos.multiprocessing, the error was:
"PicklingError: Can't pickle <type 'instancemethod'>: attribute lookup __builtin__.instancemethod failed"
You're using multiprocessing map method incorrectly.
According to python docs:
A parallel equivalent of the map() built-in function (it supports only
one iterable argument though).
Where standard map:
Apply function to every item of iterable and return a list of the
results.
Example usage:
from multiprocessing import Pool
def f(x):
return x*x
if __name__ == '__main__':
p = Pool(5)
print(p.map(f, [1, 2, 3]))
What you're looking for is apply_async method:
def test(self):
url = "http://nba.com"
self.pool = Pool(processes=1)
for x in range(0, 3):
self.pool.apply_async(self.f, args=(self, url))
self.pool.apply_async(self.g, args=(self, url, 1))
sleep(10)
The error indicates you are trying to read an attribute which is not defined for the object Test.
AttributeError: 'Test' object has no attribute 'get_type'"
In your class test, you haven't defined get_type method or any other attribute hence the error.