I want to use multiprocessing for a class method. I found out from this answer that Pool in multiprocessing cannot pickle class methods directly but there is a workaround for that by defining a function outside the class, and adds an additional argument(s) to that function (Similar suggestion is also on this blog). Hence, I tried to achieve that by the following simple program which has MyClass where I want to parallel fun. However, I am not getting any results (there is no bug). It seems I am missing something but I feel I am almost there! Any fix is really appreciated.
import multiprocessing
class MyClass:
def __init__(self):
pass
def fun(self, myList):
print myList
def unwrap_fun(obj, myList):
return obj.fun(myList)
obj = MyClass()
mlp = multiprocessing.Pool(processes=multiprocessing.cpu_count())
mlp.imap_unordered(unwrap_fun, (obj, range(1, 10)))
You should call close() and join() from your main process. Try this:
import multiprocessing
class MyClass:
def fun(self, myList):
print myList
def unwrap_fun(myList):
obj = MyClass()
return obj.fun(myList)
if __name__ == '__main__':
mlp = multiprocessing.Pool(processes=multiprocessing.cpu_count())
mlp.imap_unordered(unwrap_fun, range(1, 10))
mlp.close()
mlp.join()
Related
I am trying to understand where my mistake lies and I was hoping you could please help me.
I have this code:
import copy
class FooInd():
def __init__(self):
self.a=1
class Planning():
def foo(self,pop):
print(pop.a)
def main():
ind=FooInd()
Planning.foo(copy.deepcopy(ind))
if __name__ == "__main__":
Planning.main()
However I keep receiving this error:
Planning.foo(copy.deepcopy(ind))
TypeError: foo() missing 1 required positional argument: 'pop'
I believe that the mistake is not in the foo method definition, but in my class initiation of the FooInd, however I have checked the Python documentation for classes and I could not find a solution.
Does anyone have a clue of what could I try or where can I check?
Many thanks in advance!
You call Planning.foo on the class, not an instance of the class. You provided the second argument it requires, but not the self argument.
You have two choices:
Construct a Planning instance to call foo on:
def main():
ind=FooInd()
Planning().foo(copy.deepcopy(ind))
# ^^ Makes simple instance to call on
Make foo a classmethod or staticmethod that doesn't require an instance for self:
class Planning():
#staticmethod # Doesn't need self at all
def foo(pop):
print(pop.a)
I think you meant to instantiate Planning before calling methods on it:
import copy
class FooInd():
def __init__(self):
self.a = 1
class Planning():
def foo(self, pop):
print(pop.a)
def main(self):
ind = FooInd()
self.foo(copy.deepcopy(ind))
if __name__ == "__main__":
p = Planning()
p.main()
Output:
1
I am going to attach two blocks of code, the first is the main code that is ran the second is the testClass file containing a sample class for testing purposes. To understand what's going on it's probably easiest to run the code on your own. When I call sC.cls.print2() it says that the self parameter is unfulfilled. Normally when working with classes, self (in this case) would be sC.cls and you wouldn't have to pass it as a parameter. Any advice is greatly appreciated on why this is occuring, I think it's something to do with exec's scope but even if I run this function in exec it gives the same error and I can't figure out a way around it. If you'd like any more info please just ask!
import testClass
def main():
inst = testClass.myClass()
classInfo = str(type(inst)).split()[1].split("'")[1].split('.')
print(classInfo)
class StoreClass:
def __init__(self):
pass
exec('from {} import {}'.format(classInfo[0], classInfo[1]))
sC = StoreClass()
exec('sC.cls = {}'.format(classInfo[1]))
print(sC.cls)
sC.cls.print2()
if __name__ == '__main__':
main()
class myClass:
def printSomething(self):
print('hello')
def print2(self):
print('hi')
As part of parallellizing some existing code (with multiprocessing), I run into the situation that something similar to the class below needs to be pickled.
Starting from:
import pickle
from functools import lru_cache
class Test:
def __init__(self):
self.func = lru_cache(maxsize=None)(self._inner_func)
def _inner_func(self, x):
# In reality this will be slow-running
return x
calling
t = Test()
pickle.dumps(t)
returns
_pickle.PicklingError: Can't pickle <functools._lru_cache_wrapper object at 0x00000190454A7AC8>: it's not the same object as __main__.Test._inner_func
which I don't really understand. By the way, I also tried a variation where the name of _inner_func was func as well, that didn't change things.
If anybody is interested, this can be solved by using getstate and setstate like this:
from functools import lru_cache
from copy import copy
class Test:
def __init__(self):
self.func = lru_cache(maxsize=None)(self._inner_func)
def _inner_func(self, x):
# In reality this will be slow-running
return x
def __getstate__(self):
result = copy(self.__dict__)
result["func"] = None
return result
def __setstate__(self, state):
self.__dict__ = state
self.func = lru_cache(maxsize=None)(self._inner_func)
As detailled in the comments, the pickle module has issues when dealing with decorators. See this question for more details:
Pickle and decorated classes (PicklingError: not the same object)
Use methodtools.lru_cache not to create a new cache function in __init__
import pickle
from methodtools import lru_cache
class Test:
#lru_cache(maxsize=None)
def func(self, x):
# In reality this will be slow-running
return x
if __name__ == '__main__':
t = Test()
print(pickle.dumps(t))
It requires to install methodtools via pypi:
pip install methodtools
I would like to find all instances in the code where np.random.seed is called (without using grep). In order to set a breakpoint in ipdb, I tried to find the source file with
import inspect; inspect.getsourcefile(np.random.seed)
but it throws a TypeError because it is a built-in method (because it is coded in C).
Is it possible to watch any calls to np.random.seed by modifying something in the main source file?
Additionally, it would be suitable to patch this method, e.g. additionally logging it (or calling a debugger):
def new_random_seed(seed):
"""
This method should be called instead whenever np.random.seed
is called in any module that is invoked during the execution of
the main script
"""
print("Called with seed {}".format(seed))
#or: import ipdb; ipdb.set_trace()
return np.random.seed()
Maybe using a mock framework is the way to go?
The second question concerns the scenario in which a class B inherits from a class A in a library and I want to use the functionality of class B, but overwrite a function it uses from class A without modifying classes A and B. Probably, I should use mocking, but I am not sure about the overhead, so I wrote the following:
#in library
class A():
def __init__(self, name):
self.name = name
def work(self):
print("{} working".format(self.name))
class B():
def __init__(self):
self.A = A("Machine")
def run_task(self):
self.A.work()
# in main script
# Cannot change classes A and B, so make a subclass C
import types
class C(B):
def __init__(self, modified_work):
super().__init__()
self.A.work = types.MethodType(modified_work, self.A) #MethodType for self
b = B()
b.run_task()
modified_work = lambda self: print("{} working faster".format(self.name))
c = C(modified_work)
c.run_task()
The output is:
Machine working
Machine working faster
Is this good style?
This might be a simpler solution to your second question:
# lib.py
class A():
def work(self):
print('working')
class B():
def __init__(self):
self.a = A()
def run(self):
self.a.work()
Then in your code:
import lib
class A(lib.A):
def work(self):
print('hardly working')
lib.A = A
b = lib.B()
b.run()
Or:
import lib
class AA(lib.A):
def work(self):
print('hardly working')
class BB(lib.B):
def __init__(self):
self.a = AA()
b = lib.B()
b.run()
b = BB()
b.run()
I haven't used the threading library much in Python, so my confidence is a little shaky compared to concurrency in other languages... is this a correct way to use a threading.RLock() object as a mutex?
class MyObj(object):
def __init__(self):
self.mutex = threading.RLock()
...
def setStatistics(self, statistics):
with self.mutex:
self._statistics = statistics
def getStatistics(self):
with self.mutex:
return self._statistics.copy()
In particular I want to make sure that the self._statistics.copy() step happens while the mutex is still acquired.
Is there any other gotcha I need to be aware of? The self._statistics object is a large numpy array and I need to make sure it is transferred properly and in a consistent state between threads.
Yep, that's the right way to use it. When you use this statement:
with self.mutex:
return self._statistics.copy()
The lock won't be released until after the self._statistics.copy() operation completes, so its safe. Here's a demo:
import threading
class MyLock(threading._RLock):
def release(self):
print("releasing")
super(MyLock, self).release()
class Obj():
def test(self):
print "in test"
l = MyLock()
obj = Obj()
def f():
with l:
return obj.test()
f()
Output:
in test
releasing