How to make deep copy of Python class? - python

I'd like to make a copy of class, while updating all of its methods to refer a new set of __globals__
I was thinking something like below, however unlike types.FunctionType, the constructor for types.UnboundMethodType does not accept __globals__, any suggestions how to work around this?
def copy_class(old_class, new_module):
"""Copies a class, updating __globals__ of all methods to point to new_module"""
new_dict = {}
for name, entry in old_class.__dict__.items():
if isinstance(entry, types.UnboundMethodType):
entry = types.UnboundMethodType(name, None, old_class.__class__, globals=new_module.__dict__)
new_dict[name] = entry
return type(old_class.name, old_class.__bases__, new_dict)

The __dict__ values are functions, not unbound methods. The unbound method objects only get created on attribute access. If you are seeing unbound method objects in the __dict__, something weird happened with your class object before this function got to it.

I don't know about you, but I generally don't like to use types for anything other than type checking (which I don't do very often ;-). I'd much rather inspect...
I have to preface this code by saying that I hope you have a really good reason for wanting to do this ;-) -- to me, it seems like just subclassing and overriding class properties should get the job done much more elegantly ... However, If you really want to copy a class -- Why not just execute it's source again in the new namespace?
I've put together the following simple modules:
# test.py
# Just some test data
FOO = 1
class Bar(object):
def subclass_method(self):
print('Hello World!')
class Foo(Bar):
def method(self):
return FOO
And then something to do the heavy lifting:
import sys
import inspect
def copy_class(cls, new_globals):
source = inspect.getsource(cls)
globs = {}
globs.update(sys.modules[cls.__module__].__dict__)
globs.update(new_globals)
exec source in globs
return globs[cls.__name__]
# Check that it works...
import test
NewFoo = copy_class(test.Foo, {'FOO': 2})
print NewFoo().method()
NewFoo().subclass_method()
print test.Foo().method()
test.Foo().subclass_method()
This has some possibly desirable properties and undesirable... First, it only works on classes that are inspectable. That's pretty much anything user-defined so probably not too restrictive... It also might be a bit slower than other solutions that don't involve re-parsing the source string -- But again, it doesn't seem like this should be executed too frequently, so that's probably Ok.
Now the "advantages"...
If a global is requested by a function but not supplied, this will use the global from the old namespace. If this behavior isn't desireable (i.e. you'd rather have the NameError), you can modify the function easily to remove it.
The "copy" doesn't inherit from the original. For most purposes, that probably doesn't matter, but it's a bit weird to have the copy of something inherit from the original ...
Some people might see the exec in here and immediately think "Oh no! exec!?!?! The world is about to end!!!". Franky, that's a good default response. However, I argue that if you're copying a function that you plan to use later in the code, it is no more safe than using exec (after all, the function's code has already been executed).

Related

In python, is it possible to get an object without name from another module?

I want to get all object generated from another module, even the object do not have a name or reference, is it possible? For example:
in module1.py, there's only one line code:
MyClass()
in module2.py:
module1 = __import__("module1")
# print sth of MyClass from module1
What you're trying to do is generally impossible.
An object that has no name or other reference is garbage. That's the technical meaning of the term "garbage". In CPython (the Python implementation you're probably using if you don't know which one you're using), garbage is collected immediately—as soon as that MyClass() statement ends, the instance gets destroyed.
So, you can't access the object, because it doesn't exist.
In some other Python implementations, the object may not be destroyed until the next garbage collection cycle, but that's going to be pretty soon, and it's not deterministic exactly when—and you still have no way to get at it before it's destroyed. So it might as well not exist, even if it hasn't actually been finalized yet.
Now, "generally" means there are some exceptions. They're not common, but they do exist.
For example, imagine a class like this:
class MyClass:
_instances = []
def __init__(self):
MyClass._instances.append(self)
Now, when you do MyClass(), there actually is a reference to that instance, so it's not garbage. And, if you know where it is (which you'd presumably find in the documentation, or in the source code), you can access it as MyClass._instances[-1]. But it's unlikely that an arbitrary class MyClass does anything like this.
OK, I lied. There is sort of a way to do this, but (a) it’s cheating, and (b) it’s almost certainly a terrible idea that has no valid use cases you’ll ever think of. But just for fun, here’s how you could do this.
You need to write an import hook, and make sure it gets installed before the first time you import the module. Then you can do almost anything you want. The simplest idea I can think of is transforming the AST to turn every expression statement (or maybe just every expression statement at the top level) into an assignment statement that assigns to a hidden variable. You can even make the variable name an invalid identifier, so it'll be safe to run on any legal module no matter what's in the global namespace. Then you can access the first object created and abandoned by the module as something like module.globals()['.0'].

Proper way to return private object in Python

I'm new to Python (and liking it so far) but have many years experience with OO languages like C++ and C# and consider myself a strong OO designer.
My understanding is that Python does not strictly enforce private object properties, but that by convention people expect that if you name a property with an underscore that they will know not to access it outside the class. OK, fair enough.
My question: if an object contains "private" object and I return it to a caller, should I make a copy so they can't mess it up? Or does Python automatically make a copy?
# My Channel class has a dictionary of capabilities
class Channel(object):
def __init__(self):
self._capabilities = dict()
If I do the following can the caller mess with my capabilities by messing with the returned dictionary?
#property
def capabilities(self):
return self._capabilities
Or should I do this and return a copy to protect myself?
#property
def capabilities(self):
# I'm assuming that this creates a new copy of the dictionary
return dict(self._capabilities)
I am guessing that Python returns a reference so that the caller can indeed mess with my private dictionary (or list, or whatever) so I better make a copy first.
If you're using Python 3.3 and above, there is a class in the standard library types.MappingProxyType. Its constructor takes a dictionary and returns a read-only view. If you return this kind of object, as opposed to a copy of the dictionary, the returned MappingProxyType would raise an exception if client code tried to alter it.
You can also make your class emulate an immutable mapping by inheriting from collections.abc.Mapping and implementing three special methods: __getitem__, __iter__, and __len__. Then client code could access any item in _capabilities but could not modify it. A client could even iterate over the whole set.
But Python philosophy ("we're all adults here") says that perhaps it is better to return the dictionary and trust the user's code not to mess with it. Trying to get Python to emulate C++ is not necessarily the best approach. As you point out, Python doesn't actually prevent the client from using variables that begin with an underscore.
I slept on it and realized I could just write a test and figure out my own answer. When I return an object (a dictionary in my test) then I get a reference to the actual private object. If I add an entry to what gets returned then it adds an entry to the original object's dictionary.
So if I want to protect against that then I need to create a copy and return.
#property
def capabilities(self):
# I'm assuming that this creates a new copy of the dictionary
return dict(self._capabilities)
I think part of my original question was whether this approach was the common pattern for Python. It is in C# and I intend to do this as a general practice.
Yes, attributes starting with one underscore are considered private. You can access or modify them but you shouldn't.
However when you create a public attribute you essentially give the user the permission that it's okay to modify what it returns. Python always returns references, the question is just if the reference is mutable or immutable, dictionaries and lists are mutable so they could change the contents, while other types like numbers and strings are immutable so they are "safe to return".
Instead of thinking about how to return a copy you should think about which "properties" and "methods" of _capabilities are of interest for a user. For example if you just want a "has_capability" and "value_of_capability" you could simply create functions for that:
class Channel(object):
def __init__(self):
self._capabilities = dict()
def has_capability(self, capability):
return capability in self._capabilities
def value_of_capability(self, capability):
return self._capabilities[capability]
and likewise for other operations that should be supported. It doesn't make sense to hide an attribute and then to "expose" it (no matter if as reference or as copy). The problem with a copy is that it is slow and it's likely to lead to surprises because you can modify it but the changes don't propagate back. That's not really intuitive.

Accept different types in python function?

I have a Python function that does a lot of major work on an XML file.
When using this function, I want two options: either pass it the name of an XML file, or pass it a pre-parsed ElementTree instance.
I'd like the function to be able to determine what it was given in its variable.
Example:
def doLotsOfXmlStuff(xmlData):
if (xmlData != # if xmlData is not ET instance):
xmlData = ET.parse(xmlData)
# do a bunch of stuff
return stuff
The app calling this function may need to call it just once, or it may need to call it several times. Calling it several times and parsing the XML each time is hugely inefficient and unnecessary. Creating a whole class just to wrap this one function seems a bit overkill and would end up requiring some code refactoring. For example:
ourResults = doLotsOfXmlStuff(myObject)
would have to become:
xmlObject = XMLProcessingObjectThatHasOneFunction("data.xml")
ourResult = xmlObject.doLotsOfXmlStuff()
And if I had to run this on lots of small files, a class would be created each time, which seems inefficient.
Is there a simple way to simply detect the type of the variable coming in? I know a lot of Pythoners will say "you shouldn't have to check" but here's one good instance where you would.
In other strong-typed languages I could do this with method overloading, but that's obviously not the Pythonic way of things...
The principle of "duck typing" is that you shouldn't care so much about the specific type of an object but rather you should check whether is supports the APIs in which you're interested.
In other words if the object passed to your function through the xmlData argument contains some method or attribute which is indicative of an ElementTree that's been parsed then you just use those methods or attributes ... if it doesn't have the necessary attribute then you are free to then pass it through some parsing.
So functions/methods/attributes of the result ET are you looking to use? You can use hasattr() to check for that. Alternatively you can wrap your call to any such functionality with a try: ... except AttributeError: block.
Personally I think if not hasattr(...): is a bit cleaner. (If it doesn't have the attribut I want, then rebind the name to something which has been prepared, parsed, whatever as I need it).
This approach has advantages over isinstance() because it allows users of your functionality to pass references to objects in their own classes which have extended ET through composition rather than inheritance. In other words if I wrap an ET like object in my own class, and expose the necessary functionality then I should be able to pass reference s to your function and have you just treat my object as if it were a "duck" even if it wasn't a descendant of a duck. If you need feathers, a bill, and webbed feet then just check for one of those and try to use the rest. I may be a black box containing a duck and I may have provided holes through which the feet, duck-bill, and feathers are accessible.
This is a fairly normal pattern (e.g. Python function that accepts file object or path). Just use isinstance:
def doLotsOfXmlStuff(xmlData):
if not isinstance(xmlData, ET):
xmlData = ET.parse(xmlData)
...
If you need to do cleanup (e.g. closing files) then calling your function recursively is OK:
def doLotsOfXmlStuff(xmlData):
if not isinstance(xmlData, ET):
xmlData = ET.parse(xmlData)
ret = doLotsOfXmlStuff(xmlData)
... # cleanup (or use a context manager)
return ret
...
You can use isinstance to determine type of variable.
Can you try to put an if statement to check the type and determine what to run from there?
if type(xmlData).__name__=='ElementTree':
#do stuff
else:
#do some other stuff
I think you can just compare the data types:
if (xmlData.dtype==something):
call Function1
else:
call Function2

object type casting in Python after reloading a module? [for on-the-fly code changes]

I am running an interactive python session which builds big python data-structures (5+ GB) which take a long time to load, and so I want to exploit Python on-the-fly code change abilities at its maximum (though sometimes, without having to plan too much for that).
My current problem is the following: I have an old instance of a class that I have later modified the code and reloaded the module -- I would like the old instance to be able to use the new function definitions. How do I do that without just manually copying all the information from the old instance to a new fresh instance?
Here is what I have tried. Suppose I have the module M.py:
class A():
def f(self):
print "old class"
Here is an interactive session:
import M
old_a = M.a()
# [suppose now I change the definition of M.A.f in the source file]
reload(M)
# I attempt to use the new class definition with the old instance:
M.A.f(old_a)
at which point I get the following type error from Python:
TypeError: unbound method f() must be called with A instance as first argument (got A instance instead)
Python is obviously not happy to receive an old instance of A even though they are basically functionally equivalent types (in my code) -- is there any way I could 'type cast' it to the new instance type so that Python wouldn't complain? Something morally like: M.A.f( (M.A) old_a ) ?
There is no casting in Python but you can change the class of an existing object: It is perfectly legal and does the job:
old_a.__class__=M.A
old_a.f()
As long as you haven't changed the relation between class methods and instance variables, changed what __init__ does or something like that this is perfectly fine.
EDIT: As jsbueno points out: The __init__ or __new__ methods are not called at the point of changing __class__. Further, the new __del__ will be called at destruction.
Since you cannot cast, you need to revise your code so that these mysterious "on-the-fly code changes" can work.
Step 1. Separate Algorithm from Data. Write a very simple (and very unlikely to change) class for the raw Data. Often a list of named tuples is all you'll ever need for this.
Step 2. Create algorithms which work on the data objects by "wrapping" them instead of "updating" them.
Like this.
def some_complex_algo( list_of_named_tuples ):
for item in list_of_named_tuples:
# some calculation
yield NewTuple( result1, result2, ..., item )
Now you can attempt your processing:
result = list( some_complex_algo( source_data ) )
If you don't like the result, you only need to redefine your some_complex_algo and rerun it. The source_data is untouched. Indeed, it can be immutable.

Python functions can be given new attributes from outside the scope?

I didn't know you could do this:
def tom():
print "tom's locals: ", locals()
def dick(z):
print "z.__name__ = ", z.__name__
z.guest = "Harry"
print "z.guest = ", z.guest
print "dick's locals: ", locals()
tom() #>>> tom's locals: {}
#print tom.guest #AttributeError: 'function' object has no attribute 'guest'
print "tom's dir:", dir(tom) # no 'guest' entry
dick( tom) #>>> z.__name__ = tom
#>>> z.guest = Harry
#>>> dick's locals: {'z': <function tom at 0x02819F30>}
tom() #>>> tom's locals: {}
#print dick.guest #AttributeError: 'function' object has no attribute 'guest'
print tom.guest #>>> Harry
print "tom's dir:", dir(tom) # 'guest' entry appears
Function tom() has no locals. Function dick() knows where tom() lives and puts up Harry as 'guest' over at tom()'s place. harry doesn't appear as a local at tom()'s place, but if you ask for tom's guest, harry answers. harry is a new attribute at tom().
UPDATE: From outside tom(), you can say "print dir(tom)" and see the the tom-object's dictionary. (You can do it from inside tom(), too. So tom could find out he had a new lodger, harry, going under the name of 'guest'.)
So, attributes can be added to a function's namespace from outside the function? Is that often done? Is it acceptable practice? Is it recommended in some situations? Is it actually vital at times? (Is it Pythonic?)
UPDATE: Title now says 'attributes'; it used to say 'variables'. Here's a PEP about Function Attributes.
I think you might be conflating the concepts of local variables and function attributes. For more information on Python function attributes, see the SO question Python function attributes - uses and abuses.
#behindthefall, the motivation to give function objects generic assignable attributes (they didn't use to have them) was that, absent such possibilities, real and popular frameworks were abusing what few assignable attributes existed (typically __doc__) to record information about each given function object. So there was clearly a "pent-up demand" for this functionality, so Guido decided to address it directly (adding an optional dict to each function object to record its attributes isn't a big deal -- most function objects don't need it, and it is optional, so the cost is just 4 bytes for a null pointer;-).
Assigning such attributes in arbitrary places would be very bad practice, making the code harder to understand for no real benefit, but they're very useful when used in a controlled way -- for example, a decorator could usefully record all kinds of things about the function being decorated, and the context in which the decoration occurred, as attributes of the wrapper function, allowing trivially-easy introspection of such metadata to occur later at any time, as needed.
As other answers already pointed out, local variables (which are per-instance, not per-function object!) are a completely disjoint namespace from a function object's attributes held in its __dict__.
In python, a namespace is just a dictionary object, mapping variable name as a string (in this case, 'guest') to a value (in this case, 'Harry'). So as long as you have access to an object, and it's mutable, you can change anything about its namespace.
On small projects, it's not a huge problem, and lets you hack things together faster, but incredibly confusing on larger projects, where your data could be modified from anywhere.
There are ways of making attributes of classes "more private", such as Name Mangling.
tom.guest is just a property on the tom function object, it has nothing to do with the scope or locals() inside that function, and nothing to do with that fact that tom is a function, it would work on any object.
I have used this in the past to make a self-contained function with "enums" that go along with it.
Suppose I were implementing a seek() function. The built-in Python one (on file objects) takes an integer to tell it how to operate; yuck, give me an enum please.
def seek(f, offset, whence=0):
return f.seek(offset, whence)
seek.START = 0
seek.RELATIVE = 1
seek.END = 2
f = open(filename)
seek(f, 0, seek.START) # seek to start of file
seek(f, 0, seek.END) # seek to end of file
What do you think, too tricky and weird? I do like how it keeps the "enum" values bundled together with the function; if you import the function from a module, you get its "enum" values as well, automatically.
Python functions are lexically scoped so there is no way to add variables to the function outside of its defined scope.
However, the function still will have access to all parent scopes, if you really wanted to design the system like that (generally considered bad practice though):
>>> def foo():
>>> def bar():
>>> print x
>>> x = 1
>>> bar()
1
Mutating function variables is mostly a bad idea, since functions are assumed to be immutable. The most pythonic way of implementing this behavior is using classes and methods instead.
Python API documentation generation tools, such as pydoc and epydoc, use introspection to determine a function's name and docstring (available as the __name__ and __doc__ attributes). Well-behaved function decorators are expected to preserve these attributes, so such tools continue to work as expected (i.e. decorating a function should preserve the decorated function's documentation). You do this by copying these attributes from the decorated function to the decorator. Take a look at update_wrapper in the functools module:
WRAPPER_ASSIGNMENTS = ('__module__', '__name__', '__doc__')
WRAPPER_UPDATES = ('__dict__',)
def update_wrapper(wrapper,
wrapped,
assigned = WRAPPER_ASSIGNMENTS,
updated = WRAPPER_UPDATES):
"""Update a wrapper function to look like the wrapped function
wrapper is the function to be updated
wrapped is the original function
...
"""
for attr in assigned:
setattr(wrapper, attr, getattr(wrapped, attr))
for attr in updated:
getattr(wrapper, attr).update(getattr(wrapped, attr, {}))
...
So, that's at least one example where modifying function attributes is useful and accepted.
It some situations, it can be useful to "annotate" a function by setting an attribute; Django uses this in a few places:
You can set alters_data to True
on model methods that change the
database, preventing them from being
called in templates.
You can set
allow_tags on model methods that
will be displayed in the admin, to
signify that the method returns HTML
content, which shouldn't be
automatically escaped.
As always, use your judgement. If modifying attributes is accepted practice (for example, when writing a decorator), then by all means go ahead. If it's going to be part of a well documented API, it's probably fine too.

Categories

Resources