Traversing object hierarchy pickle style - python

I'm in a need for doing some sort of processing on the objects that get pickled just before it happens. More precisely for instances of subclasses of a certain base class I would like something totally different to be pickled instead and then recreated on loading.
I'm aware of __getstate__ & __setstate__ however this is a very invasive approach. My understanding is that these are private methods (begin with double underscore: __), and as such are subject to name mangling. Therefore this effectively would force me to redefine those two methods for every single class that I want to be subject to this non standard behavior. In addition I don't really have a full control over the hierarchy of all classes.
I was wondering if there is some sort of brief way of hooking into pickling process and applying this sort of control that __getstate__ and __setstate__ give but without having to modify the pickled classes as such.
A side note for the curious ones. This is a use case taken from a project using Django and Celery. Django models are either unpickable or very unpractical and cumbersome to pickle. Therefore it's much more advisable to pickle pairs of values ID + model class instead. However sometimes it's not the model directly that is pickled but rather a dictionary of models, a list of models, a list of lists of models, you name it. This forces me to write a lot of copy-paste code that I really dislike. A need for pickling models comes itself from Django-celery setup, where functions along with their call arguments are scheduled for later execution. Unfortunately among those arguments there are usually a lot of models mixed up in some nontrivial hierarchy.
EDIT
I do have a possibility of specifying a custom serializer to be used by Celery, so it's really a question of being able to build a slightly modified serializer on top of pickle without much effort.

The only additional hooks that are related are reduce() and __reduce__ex()
http://docs.python.org/library/pickle.html
What is the difference between __reduce__ and __reduce_ex__?
Python: Ensuring my class gets pickled only with the latest protocol
Not sure if they really provide what you need in particular.

Related

Python/Django and services as classes

Are there any conventions on how to implement services in Django? Coming from a Java background, we create services for business logic and we "inject" them wherever we need them.
Not sure if I'm using python/django the wrong way, but I need to connect to a 3rd party API, so I'm using an api_service.py file to do that. The question is, I want to define this service as a class, and in Java, I can inject this class wherever I need it and it acts more or less like a singleton. Is there something like this I can use with Django or should I build the service as a singleton and get the instance somewhere or even have just separate functions and no classes?
TL;DR It's hard to tell without more details but chances are you only need a mere module with a couple plain functions or at most just a couple simple classes.
Longest answer:
Python is not Java. You can of course (technically I mean) use Java-ish designs, but this is usually not the best thing to do.
Your description of the problem to solve is a bit too vague to come with a concrete answer, but we can at least give you a few hints and pointers (no pun intended):
1/ Everything is an object
In python, everything (well, everything you can find on the RHS of an assignment that is) is an object, including modules, classes, functions and methods.
One of the consequences is that you don't need any complex framework for dependency injection - you just pass the desired object (module, class, function, method, whatever) as argument and you're done.
Another consequence is that you don't necessarily need classes for everything - a plain function or module can be just enough.
A typical use case is the strategy pattern, which, in Python, is most often implemented using a mere callback function (or any other callable FWIW).
2/ a python module is a singleton.
As stated above, at runtime a python module is an object (of type module) whose attributes are the names defined at the module's top-level.
Except for some (pathological) corner cases, a python module is only imported once for a given process and is garanteed to be unique. Combined with the fact that python's "global" scope is really only "module-level" global, this make modules proper singletons, so this design pattern is actually already builtin.
3/ a python class is (almost) a singleton
Python classes are objects too (instance of type type, directly or indirectly), and python has classmethods (methods that act on the class itself instead of acting on the current instance) and class-level attributes (attributes that belong to the class object itself, not to it's instances), so if you write a class that only has classmethods and class attributes, you technically have a singleton - and you can use this class either directly or thru instances without any difference since classmethods can be called on instances too.
The main difference here wrt/ "modules as singletons" is that with classes you can use inheritance...
4/ python has callables
Python has the concept of "callable" objects. A "callable" is an object whose class implements the __call__() operator), and each such object can be called as if it was a function.
This means that you can not only use functions as objects but also use objects as functions - IOW, the "functor" pattern is builtin. This makes it very easy to "capture" some context in one part of the code and use this context for computations in another part.
5/ a python class is a factory
Python has no new keyword. Pythonc classes are callables, and instanciation is done by just calling the class.
This means that you can actually use a class or function the same way to get an instance, so the "factory" pattern is also builtin.
6/ python has computed attributes
and beside the most obvious application (replacing a public attribute by a pair of getter/setter without breaking client code), this - combined with other features like callables etc - can prove to be very powerful. As a matter of fact, that's how functions defined in a class become methods
7/ Python is dynamic
Python's objects are (usually) dict-based (there are exceptions but those are few and mostly low-level C-coded classes), which means you can dynamically add / replace (and even remove) attributes and methods (since methods are attributes) on a per-instance or per-class basis.
While this is not a feature you want to use without reasons, it's still a very powerful one as it allows to dynamically customize an object (remember that classes are objects too), allowing for more complex objects and classes creation schemes than what you can do in a static language.
But Python's dynamic nature goes even further - you can use class decorators and/or metaclasses to taylor the creation of a class object (you may want to have a look at Django models source code for a concrete example), or even just dynamically create a new class using it's metaclass and a dict of functions and other class-level attributes.
Here again, this can really make seemingly complex issues a breeze to solve (and avoid a lot of boilerplate code).
Actually, Python exposes and lets you hook into most of it's inners (object model, attribute resolution rules, import mechanism etc), so once you understand the whole design and how everything fits together you really have the hand on most aspects of your code at runtime.
Python is not Java
Now I understand that all of this looks a bit like a vendor's catalog, but the point is highlight how Python differs from Java and why canonical Java solutions - or (at least) canonical Java implementations of those solutions - usually don't port well to the Python world. It's not that they don't work at all, just that Python usually has more straightforward (and much simpler IMHO) ways to implement common (and less common) design patterns.
wrt/ your concrete use case, you will have to post a much more detailed description, but "connecting to a 3rd part API" (I assume a REST api ?) from a Django project is so trivial that it really doesn't warrant much design considerations by itself.
In Python you can write the same as Java program structure. You don't need to be so strongly typed but you can. I'm using types when creating common classes and libraries that are used across multiple scripts.
Here you can read about Python typing
You can do the same here in Python. Define your class in package (folder) called services
Then if you want singleton you can do like that:
class Service(object):
instance = None
def __new__(cls):
if cls.instance is not None:
return cls.instance
else:
inst = cls.instance = super(Service, cls).__new__()
return inst
And now you import it wherever you want in the rest of the code
from services import Service
Service().do_action()
Adding to the answer given by bruno desthuilliers and TreantBG.
There are certain questions that you can ask about the requirements.
For example one question could be, does the api being called change with different type of objects ?
If the api doesn't change, you will probably be okay with keeping it as a method in some file or class.
If it does change, such that you are calling API 1 for some scenario, API 2 for some and so on and so forth, you will likely be better off with moving/abstracting this logic out to some class (from a better code organisation point of view).
PS: Python allows you to be as flexible as you want when it comes to code organisation. It's really upto you to decide on how you want to organise the code.

Best practice - accessing object variables

I've been making a lot of classes an Python recently and I usually just access instance variables like this:
object.variable_name
But often I see that objects from other modules will make wrapper methods to access variables like this:
object.getVariable()
What are the advantages/disadvantages to these different approaches and is there a generally accepted best practice (even if there are exceptions)?
There should never be any need in Python to use a method call just to get an attribute. The people who have written this are probably ex-Java programmers, where that is idiomatic.
In Python, it's considered proper to access the attribute directly.
If it turns out that you need some code to run when accessing the attribute, for instance to calculate it dynamically, you should use the #property decorator.
The main advantages of "getters" (the getVariable form) in my modest opinion is that it's much easier to add functionality or evolve your objects without changing the signatures.
For instance, let's say that my object changes from implementing some functionality to encapsulating another object and providing the same functionality via Proxy Pattern (composition). If I'm using getters to access the properties, it doesn't matter where that property is being fetched from, and no change whatsoever is visible to the "clients" using your code.
I use getters and such methods especially when my code is being reused (as a library for instance), by others. I'm much less picky when my code is self-contained.
In Java this is almost a requirement, you should never access your object fields directly. In Python it's perfectly legitimate to do so, but you may take in consideration the possible benefits of encapsulation that I mentioned. Still keep in mind that direct access is not considered bad form in Python, on the contrary.
making getVariable() and setVariable() methods is called enncapsulation.
There are many advantages to this practice and it is the preffered style in object-oriented programming.
By accessing your variables through methods you can add another layer of "error checking/handling" by making sure the value you are trying to set/get is correct.
The setter method is also used for other tasks like notifying listeners that the variable have changed.
At least in java/c#/c++ and so on.

Using Django Managers vs. staticmethod on Model class directly

After reading up on Django Managers, I'm still unsure how much benefit I will get by using it. It seems that the best use is to add custom queries (read-only) methods like XYZ.objects.findBy*(). But I can easily do that with static methods off of the Model classes themselves.
I prefer the latter always because:
code locality in terms of readability and easier maintenance
slightly less verbose as I don't need the objects property in my calls
Manager classes have weird rules regarding model inheritance, might as well stay clear of that.
Is there any good reason not to use static methods and instead use Manager classes?
Adding custom queries to managers is the Django convention. From the Django docs on custom managers:
Adding extra Manager methods is the preferred way to add "table-level" functionality to your models.
If it's your own private app, the convention word doesn't matter so much - indeed my company's internal codebase has a few classmethods that perhaps belong in a custom manager.
However, if you're writing an app that you're going to share with other Django users, then they'll expect to see findBy on a custom manager.
I don't think the inheritance issues you mention are too bad. If you read the custom managers and model inheritance docs, I don't think you'll get caught out. The verbosity of writing .objects is bearable, just as it is when we do queries using XYZ.objects.get() and XYZ.objects.all()
Here's a few advantages of using manager methods in my opinion:
Consistency of API. Your method findBy belongs with get, filter, aggregate and the rest. Want to know what lookups you can do on the XYZ.objects manager? It's simple when you can introspect with dir(XYZ.objects).
Static methods "clutter" the instance namespace. XYZ.findBy() is fine but if you define a static method, you can also do xyz.findBy(). Running the findBy lookup on a particular instance doesn't really make sense.
DRYness. Sometimes you can use the same manager on more than one model.
Having said all that, it's up to you. I'm not aware of a killer reason why you should not use a static method. You're an adult, it's your code, and if you don't want to write findBy as a manager method, the sky isn't going to fall in ;)
For further reading, I recommend the blog post Managers versus class methods by James Bennett, the Django release manager.

Is factory pattern meaningless in Python?

Since Python is a duck-typed language is writing factory classes meaningless in Python?
http://en.wikipedia.org/wiki/Factory_method_pattern
While there may be times when the factory pattern is unnecessary where it may be required in other languages, there are still times when it would be valid to use it - it might just be a way of making your API cleaner - for example as a way of preventing duplication of code that decides which of a series of subclasses to return.
From the Wikipedia article you linked:
Use the factory pattern when:
The creation of the object precludes reuse without significantly duplicating code.
The creation of the object requires access to information or resources not appropriate to contain within the composing object.
The lifetime management of created objects needs to be centralised to ensure consistent behavior.
All of these can still apply when the language is duck typed.
It's not exactly a factory class, but the Python standard library has at least one factory method: http://docs.python.org/library/collections.html#collections.namedtuple.
And then, of course, there's the fact that you can create classes dynamically using the type() builtin.
I wouldn't say they're meaningless so much as that Python offers a large amount of possibilities for the sort of factories you can create. It's comparatively simple even to write a factory class that creates factory classes as callable instances of itself.

python global object cache

Little question concerning app architecture:
I have a python script, running as a daemon.
Inside i have many objects, all inheriting from one class (let's name it 'entity')
I have also one main object, let it be 'topsys'
Entities are identified by pair (id, type (= class, roughly)), and they are connected in many wicked ways. They are also created and deleted all the time, and they are need to access other entities.
So, i need a kind of storage, basically dictionary of dictionaries (one for each type), holding all entities.
And the question is, what is better: attach this dictionary to 'topsys' as a object property or to class entity, as a property of the class? I would opt for the second (so entities does not need to know of existence of 'topsys'), but i am not feeling good about using properties directly in classes. Or maybe there is another way?
There's not enough detail here to be certain of what's best, but in general I'd store the actual object registry as a module-level (global) variable in the top class, and have a method in the base class to access it.
_entities = []
class entity(object):
#staticmethod
def get_entity_registry():
return _entities
Alternatively, hide _entites entirely and expose a few methods, eg. get_object_by_id, register_object, so you can change the storage of _entities itself more easily later on.
By the way, a tip in case you're not there already: you'll probably want to look into weakrefs when creating object registries like this.
There is no problem with using properties on classes. Classes are just objects, too.
In your case, with this little information available, I would go for a class property, too, because not creating dependencies ist great and will be one worry less sometimes later.

Categories

Resources