multiprocessing and modules - python

I am attempting to use multiprocessing to call derived class member function defined in a different module. There seem to be several questions dealing with calling class methods from the same module, but none from different modules. For example, if I have the following structure:
main.py
multi/
__init__.py (empty)
base.py
derived.py
main.py
from multi.derived import derived
from multi.base import base
if __name__ == '__main__':
base().multiFunction()
derived().multiFunction()
base.py
import multiprocessing;
# The following two functions wrap calling a class method
def wrapPoolMapArgs(classInstance, functionName, argumentLists):
className = classInstance.__class__.__name__
return zip([className] * len(argumentLists), [functionName] * len(argumentLists), [classInstance] * len(argumentLists), argumentLists)
def executeWrappedPoolMap(args, **kwargs):
classType = eval(args[0])
funcType = getattr(classType, args[1])
funcType(args[2], args[3:], **kwargs)
class base:
def multiFunction(self):
mppool = multiprocessing.Pool()
mppool.map(executeWrappedPoolMap, wrapPoolMapArgs(self, 'method', range(3)))
def method(self,args):
print "base.method: " + args.__str__()
derived.py
from base import base
class derived(base):
def method(self,args):
print "derived.method: " + args.__str__()
Output
base.method: (0,)
base.method: (1,)
base.method: (2,)
Traceback (most recent call last):
File "e:\temp\main.py", line 6, in <module>
derived().multiFunction()
File "e:\temp\multi\base.py", line 15, in multiFunction
mppool.map(executeWrappedPoolMap, wrapPoolMapArgs(self, 'method', range(3)))
File "C:\Program Files\Python27\lib\multiprocessing\pool.py", line 251, in map
return self.map_async(func, iterable, chunksize).get()
File "C:\Program Files\Python27\lib\multiprocessing\pool.py", line 567, in get
raise self._value
NameError: name 'derived' is not defined
I have tried fully qualifying the class name in the wrapPoolMethodArgs method, but that just gives the same error, saying multi is not defined.
Is there someway to achieve this, or must I restructure to have all classes in the same package if I want to use multiprocessing with inheritance?

This is almost certainly caused by the ridiculous eval based approach to dynamically invoking specific code.
In executeWrappedPoolMap (in base.py), you convert a str name of a class to the class itself with classType = eval(args[0]). But eval is executed in the scope of executeWrappedPoolMap, which is in base.py, and can't find derived (because the name doesn't exist in base.py).
Stop passing the name, and pass the class object itself, passing classInstance.__class__ instead of classInstance.__class__.__name__; multiprocessing will pickle it for you, and you can use it directly on the other end, instead of using eval (which is nearly always wrong; it's code smell of the strongest sort).
BTW, the reason the traceback isn't super helpful is that the exception is raised in the worker, caught, pickle-ed, and sent back to the main process and re-raise-ed. The traceback you see is from that re-raise, not where the NameError actually occurred (which was in the eval line).

Related

Python how to access parent's modules and functions from child module

I have three files that all contain classes with the same names but slightly different definitions. Some methods in these classes are identical across all three files, so I abstracted them out to another file, utils.py, where they are defined within a "template" version of the original class. The problem is that these methods invoke functions and modules that exist in the original files but not this new one.
My original approach was to use multiple class inheritance, which would initialize the template class within the scope of the parent class, allowing access to all the functions and modules it requires. However, I was instructed to avoid multiple class inheritance and to simply import the utils file.
Importing does not apply the same scoping logic as mentioned above with inheritance. So here arises my problem. I have created a small example to show what I mean. I am using a module called datajoint. You don't need to know much about it except that a schema is basically a table or collection of tables in a database.
schemas.py
import datajoint as dj
from datetime import datetime
import utils
dj.conn()
schema = dj.Schema('adib_example1')
schema.drop()
schema = dj.Schema('adib_example1')
def test_print():
print("test")
#schema
class Subject(dj.Lookup):
definition = """
subject_id: int
"""
contents = [dict(subject_id=1)]
#schema
class Session(dj.Computed):
definition = """
-> Subject
time: varchar(30)
"""
def make(self, key):
utils.SessionTemplate.make(self,key)
Session.populate() # invokes Session's make(), passing Subject's primary key
Approach 1
Import scoping not working like inheritance
utils.py
class SessionTemplate():
#staticmethod
def make(table, key):
test_print() # parent function usage example
table.time = f"{datetime.now()}" # parent module usage example
new_entry = dict(**key, time=table.time)
table.insert1(new_entry)
error
Traceback (most recent call last):
File "/home/.anaconda/imported_make/schemas.py", line 30, in <module>
Session.populate() # invokes Session's make(), passing Subject's primary key
File "/opt/conda/lib/python3.9/site-packages/datajoint/autopopulate.py", line 153, in populate
make(dict(key))
File "/home/.anaconda/imported_make/schemas.py", line 28, in make
utils.SessionTemplate.make(self,key)
File "/home/.anaconda/imported_make/utils.py", line 5, in make
test_print() # parent function usage example
NameError: name 'test_print' is not defined
Approach 2
Importing schemas.py into utils.py works, but requires including schemas. before every imported function and module, which is not practical in my case.
utils.py
import schemas
class SessionTemplate():
#staticmethod
def make(table, key):
schemas.test_print() # parent function usage example
table.time = f"{schemas.datetime.now()}" # parent module usage example
new_entry = dict(**key, time=table.time)
table.insert1(new_entry)
Approach 3
Import using * to avoid having to add schemas. before each parent function/module somehow does not provide access to the parents modules and functions.
from schemas import *
class SessionTemplate():
#staticmethod
def make(table, key):
test_print() # parent function usage example
table.time = f"{datetime.now()}" # parent module usage example
new_entry = dict(**key, time=table.time)
table.insert1(new_entry)
error
Traceback (most recent call last):
File "/home/.anaconda/imported_make/run.py", line 1, in <module>
import schemas
File "/home/.anaconda/imported_make/schemas.py", line 30, in <module>
Session.populate() # invokes Session's make(), passing Subject's primary key
File "/opt/conda/lib/python3.9/site-packages/datajoint/autopopulate.py", line 153, in populate
make(dict(key))
File "/home/.anaconda/imported_make/schemas.py", line 28, in make
utils.SessionTemplate().make(self,key)
File "/home/.anaconda/imported_make/utils.py", line 7, in make
test_print() # parent function usage example
NameError: name 'test_print' is not defined
I know import * is bad practice, but it would have been fine in this instance if it worked, and I'm not sure why it doesn't.
boss.py
class tasks():
def job1(input):
// do something
return output
def job2(input):
// do something
return output
worker.py
import boss.tasks
from boss.tasks import job1, job2
input_value = "xyz"
output1 = boss.tasks().job1(input_value)
output2 = boss.tasks().job2(input_value)

Class instance fails isinstance check

Hi
I have some code on my CI failing (local runs do not fail).
The problem is that class instance fails isinstance() check.
Code:
File: main.py
class MyController(SuperController):
# Overrides default definition of get_variables_context()
from my_options import get_variables_context
File: my_options.py
...
def get_variables_context(self: SuperController, **kwargs):
from main import MyController
self: MyController
print(f"type(self) is {type(self)} (it {'IS' if (isinstance(self, MyController)) else 'IS NOT'} a subclass of MyController)")
_super = super(MyController, self).get_variables_context(**kwargs) or dict()
_super.update(result)
return _super
Got output and error:
type(self) is <class '__main__.SomeController'> (it IS NOT a subclass of SomeController
Traceback (most recent call last):
File "main.py", line 24, in <module>
SomeController.main(**params)
File "/builds/RND/my/rcv-nginx/tests/nginx_tests/flow.py", line 391, in main
_tests_suite, _, _ = self.prepare()
File "/builds/RND/my/rcv-nginx/tests/nginx_tests/flow.py", line 359, in prepare
context['variables_context'] = self.get_variables_context(**context)
File "/builds/RND/my/tests/integration/my_options.py", line 81, in get_variables_context
_super = super(SomeController, self).get_variables_context(**kwargs) or dict()
TypeError: super(type, obj): obj must be an instance or subtype of type
I've found the solution while investigating the root cause.
In the local run, ...
I actually call the python unittest which then calls main.py which then creates a class MyController which then calls my_options.py, and the class is added to the loaded module 'main'.
Then, MyController.get_variables_context asks for the module 'main', which is already loaded, then for the class MyController in that module, so the same type instance is returned and type check succeeds.
In the CI run, ...
I call directly main.py with the argument "test" (which should create a controller and run all tests from it via unittest), so the class MyController is created inside module __main__. MyController.get_variables_context still asks for the MyController class in main.py, but the module 'main' is not loaded here, so python loads it, creates new class MyController, and then returns it.
So, basically the answer is ...
to move MyController from main.py to the other file, i.e. controller.py

Pickling dynamically created types

I've been trying to get some dynamically created types (i.e. ones created by calling 3-arg type()) to pickle and unpickle nicely. I've been using this module switching trick to hide the details from users of the module and give clean semantics.
I've learned several things already:
The type must be findable with getattr on the module itself
The type must be consistent with what getattr finds, that is to say if we call pickle.dumps(o) then it must be true that type(o) == getattr(module, 'name of type')
Where I'm stuck though is that there still seems to be something odd going on - it seems to be calling __getstate__ on something unexpected.
Here's the simplest setup I've got that reproduces the issue, testing with Python 3.5, but I'd like to target back to 3.3 if possible:
# module.py
import sys
import functools
def dump(self):
return b'Some data' # Dummy for testing
def undump(self, data):
print('Undump: %r' % data) # Do nothing for testing
# Cheaty demo way to make this consistent
#functools.lru_cache(maxsize=None)
def make_type(name):
return type(name, (), {
'__getstate__': dump,
'__setstate__': undump,
})
class Magic(object):
def __init__(self, path):
self.path = path
def __getattr__(self, name):
print('Getting thing: %s (from: %s)' % (name, self.path))
# for simple testing all calls to make_type must end in last x.y.z.last
if name != 'last':
if self.path:
return Magic(self.path + '.' + name)
else:
return Magic(name)
return make_type(self.path + '.' + name)
# Make the switch
sys.modules[__name__] = Magic('')
And then a quick way to exercise that:
import module
import pickle
f=module.foo.bar.woof.last()
print(f.__getstate__()) # See, *this* works
print('Pickle starts here')
print(pickle.dumps(f))
Which then gives:
Getting thing: foo (from: )
Getting thing: bar (from: foo)
Getting thing: woof (from: foo.bar)
Getting thing: last (from: foo.bar.woof)
b'Some data'
Pickle starts here
Getting thing: __spec__ (from: )
Getting thing: _initializing (from: __spec__)
Getting thing: foo (from: )
Getting thing: bar (from: foo)
Getting thing: woof (from: foo.bar)
Getting thing: last (from: foo.bar.woof)
Getting thing: __getstate__ (from: foo.bar.woof)
Traceback (most recent call last):
File "test.py", line 7, in <module>
print(pickle.dumps(f))
TypeError: 'Magic' object is not callable
I wasn't expecting to see anything looking up __getstate__ on module.foo.bar.woof, but even if we force that lookup to fail by adding:
if name == '__getstate__': raise AttributeError()
into our __getattr__ it still fails with:
Traceback (most recent call last):
File "test.py", line 7, in <module>
print(pickle.dumps(f))
_pickle.PicklingError: Can't pickle <class 'module.Magic'>: it's not the same object as module.Magic
What gives? Am I missing something with __spec__? The docs for __spec__ pretty much just stress setting it appropriately, but don't seem to actually explain much.
More importantly the bigger question is how am I supposed to go about making types I programatically generated via a pseudo module's __getattr__ implementation pickle properly?
(And obviously once I've managed to get pickle.dumps to produce something I expect pickle.loads to call undump with the same thing)
To pickle f, pickle needs to pickle f's class, module.foo.bar.woof.last.
The docs don't claim support for pickling arbitrary classes. They claim the following:
The following types can be pickled:
...
classes that are defined at the top level of a module
module.foo.bar.woof.last isn't defined at the top level of a module, even a pretend module like module. In this not-officially-supported case, the pickle logic ends up trying to pickle module.foo.bar.woof, either here:
elif parent is not module:
self.save_reduce(getattr, (parent, lastname))
or here
else if (parent != module) {
PickleState *st = _Pickle_GetGlobalState();
PyObject *reduce_value = Py_BuildValue("(O(OO))",
st->getattr, parent, lastname);
status = save_reduce(self, reduce_value, NULL);
module.foo.bar.woof can't be pickled for multiple reasons. It returns a non-callable Magic instance for all unsupported method lookups, like __getstate__, which is where your first error comes from. The module-switching thing prevents finding the Magic class to pickle it, which is where your second error comes from. There are probably more incompatibilities.
As it seems, and is already proven that making the class callable is just a drifting out another wrong direction, thankfully to this hack, I could find a getaround to make the class reiterable by its TYPE. following the context of the error <class 'module.Magic'>: it's not the same object as module.Magic the pickler doesn't iterate through the same call that renders a different type from the other one, this is a major common problem with pickling self instanciating classes, for this instance, an object by its class, there for the solution is patching the class with its type #mock.patch('module.Magic', type(module.Magic)) this is a short answer for a something.
Main.py
import module
import pickle
import mock
f=module1.foo.bar.woof.last
print(f().__getstate__()) # See, *this* works
print('Pickle starts here')
#mock.patch('module1.Magic', type(module1.Magic))
def pickleit():
return pickle.dumps(f())
print(pickleit())
Magic class
class Magic(object):
def __init__(self, value):
self.path = value
__class__: lambda x:x
def __getstate__(self):
print ("Shoot me! i'm at " + self.path )
return dump(self)
def __setstate__(self,value):
print ('something will never occur')
return undump(self,value)
def __spec__(self):
print ("Wrong side of the planet ")
def _initializing(self):
print ("Even farther lost ")
def __getattr__(self, name):
print('Getting thing: %s (from: %s)' % (name, self.path))
# for simple testing all calls to make_type must end in last x.y.z.last
if name != 'last':
if self.path:
return Magic(self.path + '.' + name)
else:
return Magic(name)
print('terminal stage' )
return make_type(self.path + '.' + name)
Even assuming this is not more of striking the ball by the edge of the bat, I could see the content dumped into my console.

Why does this function not work when used as a decorator?

UPDATE: As noted by Mr. Fooz, the functional version of the wrapper has a bug, so I reverted to the original class implementation. I've put the code up on GitHub:
https://github.com/nofatclips/timeout/commits/master
There are two commits, one working (using the "import" workaround) the second one broken.
The source of the problem seems to be the pickle#dumps function, which just spits out an identifier when called on an function. By the time I call Process, that identifier points to the decorated version of the function, rather than the original one.
ORIGINAL MESSAGE:
I was trying to write a function decorator to wrap a long task in a Process that would be killed if a timeout expires. I came up with this (working but not elegant) version:
from multiprocessing import Process
from threading import Timer
from functools import partial
from sys import stdout
def safeExecution(function, timeout):
thread = None
def _break():
#stdout.flush()
#print (thread)
thread.terminate()
def start(*kw):
timer = Timer(timeout, _break)
timer.start()
thread = Process(target=function, args=kw)
ret = thread.start() # TODO: capture return value
thread.join()
timer.cancel()
return ret
return start
def settimeout(timeout):
return partial(safeExecution, timeout=timeout)
##settimeout(1)
def calculatePrimes(maxPrimes):
primes = []
for i in range(2, maxPrimes):
prime = True
for prime in primes:
if (i % prime == 0):
prime = False
break
if (prime):
primes.append(i)
print ("Found prime: %s" % i)
if __name__ == '__main__':
print (calculatePrimes)
a = settimeout(1)
calculatePrime = a(calculatePrimes)
calculatePrime(24000)
As you can see, I commented out the decorator and assigned the modified version of calculatePrimes to calculatePrime. If I tried to reassign it to the same variable, I'd get a "Can't pickle : attribute lookup builtins.function failed" error when trying to call the decorated version.
Anybody has any idea of what is happening under the hood? Is the original function being turned into something different when I assign the decorated version to the identifier referencing it?
UPDATE: To reproduce the error, I just change the main part to
if __name__ == '__main__':
print (calculatePrimes)
a = settimeout(1)
calculatePrimes = a(calculatePrimes)
calculatePrimes(24000)
#sleep(2)
which yields:
Traceback (most recent call last):
File "c:\Users\mm\Desktop\ING.SW\python\thread2.py", line 49, in <module>
calculatePrimes(24000)
File "c:\Users\mm\Desktop\ING.SW\python\thread2.py", line 19, in start
ret = thread.start()
File "C:\Python33\lib\multiprocessing\process.py", line 111, in start
self._popen = Popen(self)
File "C:\Python33\lib\multiprocessing\forking.py", line 241, in __init__
dump(process_obj, to_child, HIGHEST_PROTOCOL)
File "C:\Python33\lib\multiprocessing\forking.py", line 160, in dump
ForkingPickler(file, protocol).dump(obj)
_pickle.PicklingError: Can't pickle <class 'function'>: attribute lookup builtin
s.function failed
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "C:\Python33\lib\multiprocessing\forking.py", line 344, in main
self = load(from_parent)
EOFError
P.S. I also wrote a class version of safeExecution, which has exactly the same behaviour.
Move the function to a module that's imported by your script.
Functions are only picklable in python if they're defined at the top level of a module. Ones defined in scripts are not picklable by default. Module-based functions are pickled as two strings: the name of the module, and the name of the function. They're unpickled by dynamically importing the module then looking up the function object by name (hence the restriction on top-level-only functions).
It's possible to extend the pickle handlers to support semi-generic function and lambda pickling, but doing so can be tricky. In particular, it can be difficult to reconstruct the full namespace tree if you want to properly handle things like decorators and nested functions. If you want to do this, it's best to use Python 2.7 or later or Python 3.3 or later (earlier versions have a bug in the dispatcher of cPickle and pickle that's unpleasant to work around).
Is there an easy way to pickle a python function (or otherwise serialize its code)?
Python: pickling nested functions
http://bugs.python.org/issue7689
EDIT:
At least in Python 2.6, the pickling works fine for me if the script only contains the if __name__ block, the script imports calculatePrimes and settimeout from a module, and if the inner start function's name is monkey-patched:
def safeExecution(function, timeout):
...
def start(*kw):
...
start.__name__ = function.__name__ # ADD THIS LINE
return start
There's a second problem that's related to Python's variable scoping rules. The assignment to the thread variable inside start creates a shadow variable whose scope is limited to one evaluation of the start function. It does not assign to the thread variable found in the enclosing scope. You can't use the global keyword to override the scope because you want and intermediate scope and Python only has full support for manipulating the local-most and global-most scopes, not any intermediate ones. You can overcome this problem by placing the thread object in a container that's housed in the intermediate scope. Here's how:
def safeExecution(function, timeout):
thread_holder = [] # MAKE IT A CONTAINER
def _break():
#stdout.flush()
#print (thread)
thread_holder[0].terminate() # REACH INTO THE CONTAINER
def start(*kw):
...
thread = Process(target=function, args=kw)
thread_holder.append(thread) # MUTATE THE CONTAINER
...
start.__name__ = function.__name__ # MAKES THE PICKLING WORK
return start
Not sure really why you get that problem, but to answer your title question: Why does the decorator not work?
When you pass arguments to a decorator, you need to structure the code slightly different. Essentially you have to implement the decorator as a class with an __init__ and an __call__.
In the init, you collect the arguments that you send to the decorator, and in the call, you'll get the function you decorate:
class settimeout(object):
def __init__(self, timeout):
self.timeout = timeout
def __call__(self, func):
def wrapped_func(n):
func(n, self.timeout)
return wrapped_func
#settimeout(1)
def func(n, timeout):
print "Func is called with", n, 'and', timeout
func(24000)
This should get you going on the decorator front at least.

How to override built-in getattr in Python?

I know how to override an object's getattr() to handle calls to undefined object functions. However, I would like to achieve the same behavior for the builtin getattr() function. For instance, consider code like this:
call_some_undefined_function()
Normally, that simply produces an error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'call_some_undefined_function' is not defined
I want to override getattr() so that I can intercept the call to "call_some_undefined_function()" and figure out what to do.
Is this possible?
I can only think of a way to do this by calling eval.
class Global(dict):
def undefined(self, *args, **kargs):
return u'ran undefined'
def __getitem__(self, key):
if dict.has_key(self, key):
return dict.__getitem__(self, key)
return self.undefined
src = """
def foo():
return u'ran foo'
print foo()
print callme(1,2)
"""
code = compile(src, '<no file>', 'exec')
globals = Global()
eval(code, globals)
The above outputs
ran foo
ran undefined
You haven't said why you're trying to do this. I had a use case where I wanted to be capable of handling typos that I made during interactive Python sessions, so I put this into my python startup file:
import sys
import re
def nameErrorHandler(type, value, traceback):
if not isinstance(value, NameError):
# Let the normal error handler handle this:
nameErrorHandler.originalExceptHookFunction(type, value, traceback)
name = re.search(r"'(\S+)'", value.message).group(1)
# At this point we know that there was an attempt to use name
# which ended up not being defined anywhere.
# Handle this however you want...
nameErrorHandler.originalExceptHookFunction = sys.excepthook
sys.excepthook = nameErrorHandler
Hopefully this is helpful for anyone in the future who wants to have a special error handler for undefined names... whether this is helpful for the OP or not is unknown since they never actually told us what their intended use-case was.

Categories

Resources