Python Pickle call constructor

Python Pickle call constructor - python

I'd like to provide defaults for missing values using Python's pickle serialiser. Since the classes are simple, the defaults are naturally present in the classes's __init__ methods.
I can see from pickle documentation that there is __getnewargs__. However, this only works for cases where __getnewargs__ was present prior to "pickling".
Is there any way to tell python pickle to call always the constructor rather than starting with an uninitialised object?

Unpickling will always create an instance without calling __init__(). This is by design. In python 2 it was possible to override __getinitargs__() to cause unpickling to call __init__() with some arguments, but it was necessary to have had this method overridden at pickling time. This is not available in python 3 anymore.
To achieve what you want, wouldn't it be enough to just manually call self.__init__() from self.__setstate__(state)? You can provide any default arguments not found in state.

Related

How exactly does inspect.signature work with classes?

The inspect.signature doc states that it supports classes as input, but it doesn't go into any sort of detail:
Accepts a wide range of Python callables, from plain functions and classes to functools.partial() objects.
If I call inspect.signature(MyClass), what signature does it return? Does it return the signature of MyClass.__init__? Or MyClass.__new__? Or something else?

It tries pretty much everything it reasonably could. I think the details are probably deliberately undocumented, because they're complicated and likely to get more so as new Python versions add more stuff to try.
For example, as of CPython 3.7.3, the code path tries the following things in order:
If the metaclass has a custom __call__ defined in Python, it uses the signature of the metaclass __call__ with the first argument removed.
Otherwise, if the class has a __new__ method defined in Python, it uses the __new__ signature with the first argument removed.
Otherwise, if the class has an __init__ method defined in Python, it uses the __init__ signature with the first argument removed.
Otherwise, it traverses the MRO looking for a __text_signature__. If it finds one, it parses __text_signature__ to get the signature information.
If it still hasn't found anything, if the type's __init__ is object.__init__ and the type's __new__ is object.__new__, it returns the signature of the object class. (There's a misleading comment and a possible bug involving metaclasses around this point - the comment says it's going to check for type.__init__, but it doesn't do that. I think this commit may have made a mistake here.)
If it still hasn't found anything, it gives up and raises a ValueError saying it couldn't find anything.

Why method accepts class name and name 'object' as an argument?

Consider the following code, I expected it to generate error. But it worked. mydef1(self) should only be invoked with instance of MyClass1 as an argument, but it is accepting MyClass1 as well as rather vague object as instance.
Can someone explain why mydef is accepting class name(MyClass1) and object as argument?
class MyClass1:
def mydef1(self):
return "Hello"
print(MyClass1.mydef1(MyClass1))
print(MyClass1.mydef1(object))
Output
Hello
Hello

There are several parts to the answer to your question because your question signals confusion about a few different aspects of Python.
First, type names are not special in Python. They're just another variable. You can even do something like object = 5 and cause all kinds of confusion.
Secondly, the self parameter is just that, a parameter. When you say MyClass1.mydef1 you're asking for the value of the variable with the name mydef1 inside the variable (that's a module, or class, or something else that defines the __getattr__ method) MyClass1. You get back a function that takes one argument.
If you had done this:
aVar = MyClass1()
aVar.mydef1(object)
it would've failed. When Python gets a method from an instance of a class, the instance's __getattr__ method has special magic to bind the first argument to the same object the method was retrieved from. It then returns the bound method, which now takes one less argument.
I would recommend fiddling around in the interpreter and type in your MyClass1 definition, then type in MyClass1.mydef1 and aVar = MyClass1(); aVar.mydef1 and observe the difference in the results.
If you come from a language like C++ or Java, this can all seem very confusing. But, it's actually a very regular and logical structure. Everything works the same way.
Also, as people have pointed out, names have no type associated with them. The type is associated with the object the name references. So any name can reference any kind of thing. This is also referred to as 'dynamic typing'. Python is dynamically typed in another way as well. You can actually mess around with the internal structure of something and change the type of an object as well. This is fairly deep magic, and I wouldn't suggest doing it until you know what you're doing. And even then you shouldn't do it as it will just confuse everybody else.

Python is dynamically typed, so it doesn't care what gets passed. It only cares that the single required parameter gets an argument as a value. Once inside the function, you never use self, so it doesn't matter what the argument was; you can't misuse what you don't use in the first place.
This question only arises because you are taking the uncommon action of running an instance method as an unbound method with an explicit argument, rather than invoking it on an instance of the class and letting the Python runtime system take care of passing that instance as the first argument to mydef1: MyClass().mydef1() == MyClass.mydef1(MyClass()).

Python is not a statically-typed language, so you can pass to any function any objects of any data types as long as you pass in the right number of parameters, and the self argument in a class method is no different from arguments in any other function.

There is no problem with that whatsoever - self is an object like any other and may be used in any context where object of its type/behavior would be welcome.
Python - Is it okay to pass self to an external function

Trying to eliminate the types module in python code

Is saying:
if not callable(output.write):
raise ValueError("Output class must have a write() method")
The same as saying:
if type(output.write) != types.MethodType:
raise exceptions.ValueError("Output class must have a write() method")
I would rather not use the types module if I can avoid it.

No, they are not the same.
callable(output.write) just checks whether output.write is callable. Things that are callable include:
Bound method objects (whose type is types.MethodType).
Plain-old functions (whose type is types.FunctionType)
partial instances wrapping bound method objects (whose type is functools.partial)
Instances of you own custom callable class with a __call__ method that are designed to be indistinguishable from bound method objects (whose type is your class).
Instances of a subclass of the bound method type (whose type is that subclass).
…
type(output.write) == types.MethodType accepts only the first of these. Nothing else, not even subclasses of MethodType, will pass. (If you want to allow subclasses, use isinstance(output.write, types.MethodType).)
The former is almost certainly what you want. If I've monkeypatched an object to replace the write method with something that acts just like a write method when called, but isn't implemented as a bound method, why would your code want to reject my object?
As for your side question in the comments:
I do want to know if the exceptions.ValueError is necessary
No, it's not.
In Python 2.7, the builtin exceptions are also available in the exceptions module:
>>> ValueError is exceptions.ValueError
True
In Python 3, they were moved to builtins along with all the other builtins:
>>> ValueError is builtins.ValueError
True
But either way, the only reason you'd ever need to refer to its module is if you hid ValueError with a global of the same name in your own module.
One last thing:
As user2357112 points out in a comment, your solution doesn't really ensures anything useful.
The most common problem is almost certainly going to be output.write not existing at all. In which case you're going to get an AttributeError rather than the ValueError you wanted. (If this is acceptable, you don't need to check anything—just call the method and you'll get an AttributeError if it doesn't exist, and a TypeError if it does but isn't callable.) You could solve that by using getattr(output, 'write', None) instead of output.write, because None is not callable.
The next most common problem is probably going to be output.write existing, and being callable, but with the wrong signature. Which means you'll still get the same TypeError you were trying to avoid when you try to call it. You could solve that by, e.g., using the inspect module.
But if you really want to do all of this, you should probably be factoring it all out into an ABC. ABCs only have built-in support for checking that abstract methods exist as attributes; it doesn't check whether they're callable, or callable with the right signature. But it's not that hard to extend that support. (Or, maybe better, just grabbing one of the interface/protocol modules off PyPI.) And I think something like isinstance(output, StringWriteable) would declare your intention a lot better than a bunch of lines involving getattr or hasattr, type checking, and inspect grubbing.

Is there a built-in way to use CPython built-ins to make an arbitrary callable behave as an unbound class method?

In Python 2, it was possible to convert arbitrary callables to methods of a class. Importantly, if the callable was a CPython built-in implemented in C, you could use this to make methods of user-defined classes that were C layer themselves, invoking no byte code when called.
This is occasionally useful if you're relying on the GIL to provide "lock-free" synchronization; since the GIL can only be swapped out between op codes, if all the steps in a particular part of your code can be pushed to C, you can make it behave atomically.
In Python 2, you could do something like this:
import types
from operator import attrgetter
class Foo(object):
... This class maintains a member named length storing the length...
def __len__(self):
return self.length # We don't want this, because we're trying to push all work to C
# Instead, we explicitly make an unbound method that uses attrgetter to achieve
# the same result as above __len__, but without no byte code invoked to satisfy it
Foo.__len__ = types.MethodType(attrgetter('length'), None, Foo)
In Python 3, there is no longer an unbound method type, and types.MethodType only takes two arguments and creates only bound methods (which is not useful for Python special methods like __len__, __hash__, etc., since special methods are often looked up directly on the type, not the instance).
Is there some way of accomplishing this in Py3 that I'm missing?
Things I've looked at:
functools.partialmethod (appears to not have a C implementation, so it fails the requirements, and between the Python implementation and being much more general purpose than I need, it's slow, taking about 5 us in my tests, vs. ~200-300 ns for direct Python definitions or attrgetter in Py2, a roughly 20x increase in overhead)
Trying to make attrgetter or the like follow the non-data descriptor protocol (not possible AFAICT, can't monkey-patch in a __get__ or the like)
Trying to find a way to subclass attrgetter to give it a __get__, but of course, the __get__ needs to be delegated to C layer somehow, and now we're back where we started
(Specific to attrgetter use case) Using __slots__ to make the member a descriptor in the first place, then trying to somehow convert from the resulting descriptor for the data into something that skips the final step of binding and acquiring the real value to something that makes it callable so the real value retrieval is deferred
I can't swear I didn't miss something for any of those options though. Anyone have any solutions? Total hackery is allowed; I recognize I'm doing pathological things here. Ideally it would be flexible (to let you make something that behaves like an unbound method out of a class, a Python built-in function like hex, len, etc., or any other callable object not defined at the Python layer). Importantly, it needs to attach to the class, not each instance (both to reduce per-instance overhead, and to work correctly for dunder special methods, which bypass instance lookup in most cases).

Found a (probably CPython only) solution to this recently. It's a little ugly, being a ctypes hack to directly invoke CPython APIs, but it works, and gets the desired performance:
import ctypes
from operator import attrgetter
make_instance_method = ctypes.pythonapi.PyInstanceMethod_New
make_instance_method.argtypes = (ctypes.py_object,)
make_instance_method.restype = ctypes.py_object
class Foo:
# ... This class maintains a member named length storing the length...
# Defines a __len__ method that, at the C level, fetches self.length
__len__ = make_instance_method(attrgetter('length'))
It's an improvement over the Python 2 version in one way, since, as it doesn't need the class to be defined to make an unbound method for it, you can define it in the class body by simple assignment (where the Python 2 version must explicitly reference Foo twice in Foo.__len__ = types.MethodType(attrgetter('length'), None, Foo), and only after class Foo has finished being defined).
On the other hand, it doesn't actually provide a performance benefit on CPython 3.7 AFAICT, at least not for the simple case here where it's replacing def __len__(self): return self.length; in fact, for __len__ accessed via len(instance) on an instance of Foo, ipython %%timeit microbenchmarks show len(instance) is ~10% slower when __len__ is defined via __len__ = make_instance_method(attrgetter('length')), . This is likely an artifact of attrgetter itself having slightly higher overhead due to CPython not having moved it to the "FastCall" protocol (called "Vectorcall" in 3.8 when it was made semi-public for provisional third-party use), while user-defined functions already benefit from it in 3.7, as well as having to dynamically choose whether to perform dotted or undotted attribute lookup and single or multiple attribute lookup each time (which Vectorcall might be able to avoid by choosing a __call__ implementation appropriate to the gets being performed at construction time) adds more overhead that the plain method avoids. It should win for more complicated cases (say, if the attribute to be retrieved is a nested attribute like self.contained.length), since attrgetter's overhead is largely fixed, while nested attribute lookup in Python means more byte code, but right now, it's not useful very often.
If they ever get around to optimizing operator.attrgetter for Vectorcall, I'll rebenchmark and update this answer.

Using an Abstract Base class as an argument to a function (Python)

If I have an Abstract Base Class called BaseData which has the function update which is overridden with different functionality in its Child Classes, can I have a function as follows, where I want the function to take any Child Class as an argument and call the update function for the corresponding Child Class.
def date_func(BaseData, time):
result = BaseData.update(time)
lastrow = len(result.index)
return result['Time'].iloc[lastrow],result['Time'].iloc[lastrow-100]

Sure you can. Python won't care because it doesn't do any type checking.
In fact, you can use any type that provides a compatible interface independent from whether the instance derives from BaseData.

Including the name of the ABC as the name of the parameter won't restrict it to only subclasses of the ABC. All it does is make a parameter of that name.
Any object of any type can be passed in as an argument to any function or method. Any object that - in this case - doesn't have update() will cause an AttributeError to be raised, but if the argument has an update() method that can accept the one argument given, it won't cause a problem.
If you want to be certain that the first argument is a subclass of BaseData, follow these steps:
rename the parameter to something like data. This will make it so that the name of the parameter isn't shadowing ("replacing within this context") out the actual BaseData class
write if isinstance(data, BaseData): at the beginning of the function, tabbing everything that was already there over to be within it.
(optional) write an else clause that raises an Error. If you don't do this, then None will simply be returned when the type check fails.
Now that you know how to do what you're asking, you should be aware that there are few worthwhile cases for doing this. Again, any object that fulfills the needed 'protocol' can work and doesn't need to necessarily be a subclass of your ABC.
This follows python's principle of "It's easier to ask for forgiveness than permission" (or EAFTP), which lets us assume that the person who passed in an argument gave one of a compatible type. If you're worried about the possibility of someone giving the wrong type, then you can wrap the code in a try-catch block that deals with the exception raised when it's wrong. This is how we "ask for forgiveness".
Generally, if you're going to do type checks, it's because you're prepared to handle different sets of protocols and the ABCs that define these protocols also (preferably) define __subclasshook__() so that it doesn't JUST check whether the class is 'registered' subclass, but rather follows the prescribed protocol.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.