Stable python serialization (e.g. no pickle module relocation issues)

Stable python serialization (e.g. no pickle module relocation issues) - python

I am considering the use of Quantities to define a number together with its unit. This value most likely will have to be stored on the disk. As you are probably aware, pickling has one major issue: if you relocate the module around, unpickling will not be able to resolve the class, and you will not be able to unpickle the information. There are workarounds for this behavior, but they are, indeed, workarounds.
A solution I fantasized for this issue would be to create a string encoding uniquely a given unit. Once you obtain this encoding from the disk, you pass it to a factory method in the Quantities module, which decodes it to a proper unit instance. The advantage is that even if you relocate the module around, everything will still work, as long as you pass the magic string token to the factory method.
Is this a known concept?

Looks like an application of Wheeler's First Principle, "all problems in computer science can be solved by another level of indirection" (the Second Principle adds "but that will usually create another problem";-). Essentially what you need to do is an indirection to identify the type -- entity-within-type will be fine with pickling-like approaches (you can study the sources of pickle.py and copy_reg.py for all the fine details of the latter).
Specifically, I believe that what you want to do is subclass pickle.Pickler and override the save_inst method. Where the current version says:
if self.bin:
save(cls)
for arg in args:
save(arg)
write(OBJ)
else:
for arg in args:
save(arg)
write(INST + cls.__module__ + '\n' + cls.__name__ + '\n')
you want to write something different than just the class's module and name -- some kind of unique identifier (made up of two string) for the class, probably held in your own registry or registries; and similarly for the save_global method.
It's even easier for your subclass of Unpickler, because the _instantiate part is already factored out in its own method: you only need to override find_class, which is:
def find_class(self, module, name):
# Subclasses may override this
__import__(module)
mod = sys.modules[module]
klass = getattr(mod, name)
return klass
it must take two strings and return a class object; you can do that through your registries, again.
Like always when registries are involved, you need to think about how to ensure you register all objects (classes) of interest, etc, etc. One popular strategy here is to leave pickling alone, but ensure that all moves of classes, renames of modules, etc, are recorded somewhere permanent; this way, just the subclassed unpickler can do all the work, and it can most conveniently do it all in the overridden find_class -- bypassing all issues of registration. I gather you consider this a "workaround" but to me it seems just an extremely simple, powerful and convenient implementation of the "one more level of indirection" concept, which avoids the "one more problem" issue;-).

Related

Selectively disable inherited-members

After much struggle with awful defaults in Sphinx, I finally found a way to display inherited methods in subclass documentation. Unfortunately, this option is global...
autodoc_default_options = {
...
'inherited-members': True,
}
Is there any way to annotate any given class to prevent inherited methods and fields from showing in that documentation?
If there is no way to base it on inheritance, is there any way to simply list all the methods I don't want to be documented for a given class in its docstring?
I'm OK... well, I'd cry a little, but I'd live if I had to list the methods that need to be documented rather than blacklising the ones I don't want.
I know I can put :meta private: on a method definition to circumvent its inclusion in documentation (sort of, not really, but let's pretend it works), but in the case of inherited methods there's nowhere I can attach the docstring to.
Note that any "solution" that involves writing .. automodule:: section by hand is not a solution -- those must be generated.

Well, I figured how to solve half of the problem. Brace yourself for a lot of duct tape...
So, here's an example of how you can disable inherited fields in everything that extends directly or indirectly Exception class:
def is_error(obj):
return issubclass(obj, Exception)
conditionally_ignored = {
'__reduce__': is_error,
'__init__': is_error,
'with_traceback': is_error,
'args': is_error,
}
def skip_member_handler(app, objtype, membername, member, skip, options):
ignore_checker = conditionally_ignored.get(membername)
if ignore_checker:
frame = sys._getframe()
while frame.f_code.co_name != 'filter_members':
frame = frame.f_back
suspect = frame.f_locals['self'].object
return not ignore_checker(suspect)
return skip
def setup(app):
app.connect('autodoc-skip-member', skip_member_handler)
Put the above in your config.py. This implies that you are using autodoc to generate documentation.
In the end, I didn't go for controlling this behavior from the docstring of the class being documented. This is just too much work, and there are too many bugs in autodoc, Sphinx builders, the generated output and so on. Just not worth it.
But, the general idea would be to similarly add an event handler for when the source is read, extract information from docstrings, replace docstrings by patched docstrings w/o the necessary information, keep the extracted information somewhere until the skip handler is called, and then implement skipping. But, like I said, there are simply too many things broken along this line.
Another approach would be to emit XML, patch it with XSLT and feed it back to Docutils, but at the time of writing XML generator is broken (it adds namespaced attributes to XML, while it doesn't declare namespaces...) Old classic.
Yet another approach would've been handling an event, when, say the document is generated and references are resolved. Unfortunately, at that point, you will be missing information such as types of things you are working with, the original docstrings etc. You'll be essentially patching HTML, but with very convoluted structure.

How to check an argument type before init gets called

I need to check the argument type in __init__(). I did it this way:
class Matrix:
def __init__(self, matrix):
"""
if type(matrix) != list raise argument error
"""
if type(matrix) != list:
raise TypeError(f"Expected list got {type(matrix)}")
self.matrix = matrix
a = 5
new_matrix = Matrix(a)
But when the TypeError gets raised the __init__() method is already called. I am wondering how to do this before it gets called. I reckon this could be done using metaclass to "intercept it" at some point and raise an error, but I do not know how.

First:
one usually do not make such runtime checks in Python - unless strictly necessary. The idea is that whatever gets passed to __init__ in this case will behave similarly enough to a list, to be used in its place. That is the idea of "duck typing".
Second:
if this check is really necessary, then an if statement inside the function or method body, just like you did, is the way to do it. It does not matter that the "method was run, and the error was raised inside it" - this is the way dynamic typing works in Python.
Third:
There actually is a way to prevent your program to ever call __init__ with an incorrect parameter type, and that is to use static type checking. Your program will error on the preparation steps, when you run the checker, like "mypy" - that is roughly the same moment in time some static languages would raise the error: when they are compiled in an explicit step prior to being run. Static type checking can add the safety you think you need - but it is a whole new world of boilerplate and bureaucracy to code in Python. A web search for "python static type checking" can list you some points were you can start to learn that - the second link I get seens rather interesting: https://realpython.com/python-type-checking/
Fourth:
If you opt for the if-based checking, you should check if the object you got is "enough of a list" for your purposes, not "type(a) != list". This will bar subclasses of lists. not isinstance(a, list) will accept list subclasses, but block several other object types that might just work. Depending on what you want, your code will work with any "sequence" type. In that case, you can import collections.abc.Sequence and check if your parameter is an instance of that instead - this will allow the users to your method to use any classes that have a length and can retrieve itens in order.
And, just repeating it again: there is absolutely no problem in making this check inside the method. It could be factored out, by creating a complicated decorator that could do type checking - actually there are Python packages that can use type annotations, just as they are used by static type checking tools, and do runtime checks. But this won't gain you any execution time. Static type checking will do it before running, but the resources gained by that are negligible, nonetheless.
And finally, no, this has nothing to do with a metaclass. It would be possible to use a metaclass to add decorators to all your methods, and have these decorators performe the runtime checking - but you might just use the decorator explicitly anyway.

NetworkX Graph object not subscriptable

I was trying to monkey patch the NetworkX Graph object without typing out
networkx.Graph.method_name = method_name
for every single method I defined. I tried this (minimal version):
import networkx
class _GraphExtended (networkx.Graph):
def is_nonnull(self):
return bool(self.nodes())
for key in _GraphExtended.__dict__:
nx.Graph[key] = _GraphExtended[key]
and I got the error "'type' object is not subscriptable" for every key. How do I monkey patch all methods using a loop?

Analysis of your current approach
You are using subscript notation via the square brackets. Normally, you would type my_object[key], which is translated as a first approximation* into my_object.__getitem__(key).
In particular, if the type(my_object) does not define the __getitem__ attribute, then you effectively get an error that says that type(my_object) is not subscriptable.
In your case, type(_GraphExtended) == type holds true. Furthermore, the type class does not define any __getitem__ attribute. Therefore, this is why you get the error message that type is not subscriptable.
*For the sake of completeness, a more accurate translation would be along the lines of: object.__getattribute__(my_object, '__getitem__')(key).
What you probably intended
What you probably intended was to set the 'method_name' attribute of the networkx.Graph object. In general, this can be accomplished by using the setattr built-in function, as follows:
setattr(networkx.Graph, key, value)
Also, _GraphExtended.__dict__ contains many more keys than what you intend to monkey patch. You may be able to filter out those that start and end with double underscore, but I am neither confident that this filter works under all circumstances nor confident that it is forward-compatible with Python.
Pitfalls to monkey patching
Firstly, monkey patching may break forward-compatibility with the networkx library. There is no guarantee that future versions of networkx will avoid the same method names that you have chosen to monkey patch.
Secondly, monkey patching will prevent you from writing reusable code. It is no longer possible for other developers to reuse your convenience function(s) unless they themselves monkey patch their code, and there may likely be unforeseen reasons that prevent this from being possible.
Practical advice
Don't Do It. I must warn you that the monkey patching library code is very poor style, and should only be used as a last resort in the world of programming (e.g. if it were to have a positive and measurable effect on business revenue or a related resource thereof such as development time).
What are the underlying concern(s) that you wish to solve? I would be willing to followup with alternative solutions that address each underlying concern you may have.
Also, have you considered the simple approach of defining a helper module containing helper functions, such as:
# Module graph_utils
def is_nonnull(graph):
return bool(graph.nodes())
Other notes
Python already has a convention to handle boolean contexts: anything considered empty should also be considered False. For example, according to networkx documentation, the Graph class defines a __len__ method that returns the number of nodes. Because of __len__, Python allows using Graph objects in contexts where a bool is expected. For instance,
graph = networkx.Graph()
print(not graph) # Prints True iff len(graph) == 0
if graph:
print('Graph is nonnull.')
else:
print('Graph is null.')

In python is there a way to know if an object "implements an interface" before I pass it to a function?

I know this may sound like a stupid question, especially to someone who knows python's nature, but I was just wondering, is there a way to know if an object "implements an interface" so as to say?
To give an example of what I want to say:
let's say I have this function:
def get_counts(sequence):
counts = {}
for x in sequence:
if x in counts:
counts[x] += 1
else:
counts[x] = 1
return counts
My question is: Is there a way to make sure that the object passed to the function is iterable? I know that in Java or C# I could do this by having the method accept any object that implements a specific interface, let's say (for example) iIterable like this: void get_counts(iIterable sequence)
My guess is that in Python I would have to employ preemptive introspection checks (in a decorator perhaps?) and throw a custom exception if the object doesn't have an __iter__ attribute). But is there a more pythonic way to do this?

Use polymorphism and duck-typing before isinstance() or interfaces
You generally define what you want to do with your objects, then either use polymorphism to adjust how each object responds to what you want to do, or you use duck typing; test if the object at hand can do the thing you want to do in the first place. This is the invocation versus introspection trade-off, conventional wisdom states that invocation is preferable over introspection, but in Python, duck-typing is preferred over isinstance testing.
So you need to work out why you need to filter on wether or not something is iterable in the first place; why do you need to know this? Just use a try: iter(object), except TypeError: # not iterable to test.
Or perhaps you just need to throw an exception if whatever that was passed was not an iterable, as that would signal an error.
ABCs
With duck-typing, you may find that you have to test for multiple methods, and thus a isinstance() test may look a better option. In such cases, using a Abstract Base Class (ABC) could also be an option; using an ABC let's you 'paint' several different classes as being the right type for a given operation, for example. Using a ABC let's you focus on the tasks that need to be performed rather than the specific implementations used; you can have a Paintable ABC, a Printable ABC, etc.
Zope interfaces and component architecture
If you find your application is using an awful lot of ABCs or you keep having to add polymorphic methods to your classes to deal with various different situations, the next step is to consider using a full-blown component architecture, such as the Zope Component Architecture (ZCA).
zope.interface interfaces are ABCs on steroids, especially when combined with the ZCA adapters. Interfaces document expected behaviour of a class:
if IFrobnarIterable.providedBy(yourobject):
# it'll support iteration and yield Frobnars.
but it also let's you look up adapters; instead of putting all the behaviours for every use of shapes in your classes, you implement adapters to provide polymorphic behaviours for specific use-cases. You can adapt your objects to be printable, or iterable, or exportable to XML:
class FrobnarsXMLExport(object):
adapts(IFrobnarIterable)
provides(IXMLExport)
def __init__(self, frobnariterator):
self.frobnars = frobnariterator
def export(self):
entries = []
for frobnar in self.frobnars:
entries.append(
u'<frobnar><width>{0}</width><height>{0}</height></frobnar>'.format(
frobnar.width, frobnar.height)
return u''.join(entries)
and your code merely has to look up adapters for each shape:
for obj in setofobjects:
self.result.append(IXMLExport(obj).export())

Python (since 2.6) has abstract base classes (aka virtual interfaces), which are more flexible than Java or C# interfaces. To check whether an object is iterable, use collections.Iterable:
if isinstance(obj, collections.Iterable):
...
However, if your else block would just raise an exception, then the most Python answer is: don't check! It's up to your caller to pass in an appropriate type; you just need to document that you're expecting an iterable object.

The Pythonic way is to use duck typing and, "ask forgiveness, not permission". This usually means performing an operation in a try block assuming it behaves the way you expect it to, then handling other cases in an except block.

I think this is the way that the community would recommend you do it:
import sys
def do_something(foo):
try:
for i in foo:
process(i)
except:
t, ex, tb = sys.exc_info()
if "is not iterable" in ex.message:
print "Is not iterable"
do_something(True)
Or, you could use something like zope.interface.

Which is a better repr for a custom Python class?

It seems there are different ways the __repr__ function can return.
I have a class InfoObj that stores a number of things, some of which I don't particularly want users of the class to set by themselves. I recognize nothing is protected in python and they could just dive in and set it anyway, but seems defining it in __init__ makes it more likely someone might see it and assume it's fine to just pass it in.
(Example: Booleans that get set by a validation function when it determines that the object has been fully populated, and values that get calculated from other values when enough information is stored to do so... e.g. A = B + C, so once A and B are set then C is calculated and the object is marked Valid=True.)
So, given all that, which is the best way to design the output of __ repr__?
bob = InfoObj(Name="Bob")
# Populate bob.
# Output type A:
bob.__repr__()
'<InfoObj object at 0x1b91ca42>'
# Output type B:
bob.__repr__()
'InfoObj(Name="Bob",Pants=True,A=7,B=5,C=2,Valid=True)'
# Output type C:
bob.__repr__()
'InfoObj.NewInfoObj(Name="Bob",Pants=True,A=7,B=5,C=2,Valid=True)'
... the point of type C would be to not happily take all the stuff I'd set 'private' in C++ as arguments to the constructor, and make teammates using the class set it up using the interface functions even if it's more work for them. In that case I would define a constructor that does not take certain things in, and a separate function that's slightly harder to notice, for the purposes of __repr__
If it makes any difference, I am planning to store these python objects in a database using their __repr__ output and retrieve them using eval(), at least unless I come up with a better way. The consequence of a teammate creating a full object manually instead of going through the proper interface functions is just that one type of info retrieval might be unstable until someone figures out what he did.

The __repr__ method is designed to produce the most useful output for the developer, not the enduser, so only you can really answer this question. However, I'd typically go with option B. Option A isn't very useful, and option C is needlessly verbose -- you don't know how your module is imported anyway. Others may prefer option C.
However, if you want to store Python objects is a database, use pickle.
import pickle
bob = InfoObj(Name="Bob")
> pickle.dumps(bob)
b'...some bytestring representation of Bob...'
> pickle.loads(pickle.dumps(bob))
Bob(...)
If you're using older Python (pre-3.x), then note that cPickle is faster, but pickle is more extensible. Pickle will work on some of your classes without any configuration, but for more complicated objects you might want to write custom picklers.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Stable python serialization (e.g. no pickle module relocation issues) - python

Related

Selectively disable inherited-members

How to check an argument type before init gets called

NetworkX Graph object not subscriptable

In python is there a way to know if an object "implements an interface" before I pass it to a function?

Which is a better repr for a custom Python class?

Categories

Resources

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Stable python serialization (e.g. no pickle module relocation issues) - python

Related

Selectively disable inherited-members

How to check an argument type before __init__ gets called

NetworkX Graph object not subscriptable

In python is there a way to know if an object "implements an interface" before I pass it to a function?

Which is a better __repr__ for a custom Python class?

Categories

Resources

How to check an argument type before init gets called

Which is a better repr for a custom Python class?