I want to parse an Apache access.log file with a python program in a certain way, and though I am completely new to object-oriented programming, I want to start doing it now.
I am going to create a class ApacheAccessLog, and the only thing I can imagine now, it will be doing is 'readline' method. Is it conventionally correct to inherit from the builtin file class in this case, so the class will behave just like an instance of the file class itself, or not? What is the best way of doing that?
In this case I would use delegation rather than inheritance. It means that your class should contain the file object as an attribute and invoke a readline method on it. You could pass a file object in the constructor of the logger class.
There are at least two reasons for this:
Delegation reduces coupling, for example in place of file objects you can use any other object that implements a readline method (duck typing comes handy here).
When inheriting from file the public interface of your class becomes unnecessarily broad. It includes all the methods defined on file even if these methods don't make sense in case of Apache log.
I am coming from a Java background but I am fairly confident that the same principles will apply in Python. As a rule of thumb you should never inherit from a class whose implementation you don't understand and control unless that class has been designed specifically for inheritance. If it has been designed in this way it should describe this clearly in its documentation.
The reason for this is that inheritance can potentially bind you to the implementation details of the class that you are inheriting from.
To use an example from Josh Bloch's book 'Effective Java'
If we were to extend the class ArrayList class in order to be able to count the number of items that were added to it during its life-time (not necessarily the number it currently contains) we may be tempted to write something like this.
public class CountingList extends ArrayList {
int counter = 0;
public void add(Object o) {
counter++;
super.add(0);
}
public void addAll(Collection c) {
count += c.size();
super.addAll(c);
}
// Etc.
}
Now this extension looks like it would accurately count the number of elements that were added to the list but in fact it may not. If ArrayList has implemented addAll by iterating over the Collection provided and calling its interface method addAll for each element then we will count each element added through the addAll method twice. Now the behaviour of our class is dependent on the implementation details of ArrayList.
This is of course in addition to the disadvantage of not being able to use other implementations of List with our CountingList class. Plus the disadvantages of inheriting from a concrete class that are discussed above.
It is my understanding that Python uses a similar (if not identical) method dispatch mechanism to Java and will therefore be subject to the same limitations. If someone could provide an example in Python I'm sure it would be even more useful.
It is perfectly acceptable to inherit from a built in class. In this case I'd say you're right on the money.
The log "is a" file so that tells you inheritance is ok..
General rule.
Dog "is a"n animal, therefore inherit from animal.
Owner "has a"n animal therefore don't inherit from animal.
Although it is in some cases useful to inherit from builtins, the real question here is what you want to do with the output and what's your big-picture design. I would usually write a reader (that uses a file object) and spit out whatever data class I need to hold the information I just read. It's then easy to design that data class to fit in with the rest of my design.
You should be fairly safe inheriting from a "builtin" class, as later modifications to these classes will usually be compatible with the current version.
However, you should think seriously about wether you really want to tie your class to the additional functionality provided by the builtin class. As mentioned in another answer you should consider (perhaps even prefer) using delegation instead.
As an example of why to avoid inheritance if you don't need it you can look at the java.util.Stack class. As it extends Vector it inherits all of the methods on Vector. Most of these methods break the contract implied by Stack, e.g. LIFO. It would have been much better to implement Stack using a Vector internally, only exposing Stack methods as the API. It would then have been easy to change the implementation to ArrayList or something else later, none of which is possible now due to inheritance.
You seem to have found your answer that in this case delegation is the better strategy. Nevertheless, I would like to add that, excepting delegation, there is nothing wrong with extending a built-in class, particularly if your alternative, depending on the language, is "monkey patching" (see http://en.wikipedia.org/wiki/Monkey_patch)
Related
Are there any conventions on how to implement services in Django? Coming from a Java background, we create services for business logic and we "inject" them wherever we need them.
Not sure if I'm using python/django the wrong way, but I need to connect to a 3rd party API, so I'm using an api_service.py file to do that. The question is, I want to define this service as a class, and in Java, I can inject this class wherever I need it and it acts more or less like a singleton. Is there something like this I can use with Django or should I build the service as a singleton and get the instance somewhere or even have just separate functions and no classes?
TL;DR It's hard to tell without more details but chances are you only need a mere module with a couple plain functions or at most just a couple simple classes.
Longest answer:
Python is not Java. You can of course (technically I mean) use Java-ish designs, but this is usually not the best thing to do.
Your description of the problem to solve is a bit too vague to come with a concrete answer, but we can at least give you a few hints and pointers (no pun intended):
1/ Everything is an object
In python, everything (well, everything you can find on the RHS of an assignment that is) is an object, including modules, classes, functions and methods.
One of the consequences is that you don't need any complex framework for dependency injection - you just pass the desired object (module, class, function, method, whatever) as argument and you're done.
Another consequence is that you don't necessarily need classes for everything - a plain function or module can be just enough.
A typical use case is the strategy pattern, which, in Python, is most often implemented using a mere callback function (or any other callable FWIW).
2/ a python module is a singleton.
As stated above, at runtime a python module is an object (of type module) whose attributes are the names defined at the module's top-level.
Except for some (pathological) corner cases, a python module is only imported once for a given process and is garanteed to be unique. Combined with the fact that python's "global" scope is really only "module-level" global, this make modules proper singletons, so this design pattern is actually already builtin.
3/ a python class is (almost) a singleton
Python classes are objects too (instance of type type, directly or indirectly), and python has classmethods (methods that act on the class itself instead of acting on the current instance) and class-level attributes (attributes that belong to the class object itself, not to it's instances), so if you write a class that only has classmethods and class attributes, you technically have a singleton - and you can use this class either directly or thru instances without any difference since classmethods can be called on instances too.
The main difference here wrt/ "modules as singletons" is that with classes you can use inheritance...
4/ python has callables
Python has the concept of "callable" objects. A "callable" is an object whose class implements the __call__() operator), and each such object can be called as if it was a function.
This means that you can not only use functions as objects but also use objects as functions - IOW, the "functor" pattern is builtin. This makes it very easy to "capture" some context in one part of the code and use this context for computations in another part.
5/ a python class is a factory
Python has no new keyword. Pythonc classes are callables, and instanciation is done by just calling the class.
This means that you can actually use a class or function the same way to get an instance, so the "factory" pattern is also builtin.
6/ python has computed attributes
and beside the most obvious application (replacing a public attribute by a pair of getter/setter without breaking client code), this - combined with other features like callables etc - can prove to be very powerful. As a matter of fact, that's how functions defined in a class become methods
7/ Python is dynamic
Python's objects are (usually) dict-based (there are exceptions but those are few and mostly low-level C-coded classes), which means you can dynamically add / replace (and even remove) attributes and methods (since methods are attributes) on a per-instance or per-class basis.
While this is not a feature you want to use without reasons, it's still a very powerful one as it allows to dynamically customize an object (remember that classes are objects too), allowing for more complex objects and classes creation schemes than what you can do in a static language.
But Python's dynamic nature goes even further - you can use class decorators and/or metaclasses to taylor the creation of a class object (you may want to have a look at Django models source code for a concrete example), or even just dynamically create a new class using it's metaclass and a dict of functions and other class-level attributes.
Here again, this can really make seemingly complex issues a breeze to solve (and avoid a lot of boilerplate code).
Actually, Python exposes and lets you hook into most of it's inners (object model, attribute resolution rules, import mechanism etc), so once you understand the whole design and how everything fits together you really have the hand on most aspects of your code at runtime.
Python is not Java
Now I understand that all of this looks a bit like a vendor's catalog, but the point is highlight how Python differs from Java and why canonical Java solutions - or (at least) canonical Java implementations of those solutions - usually don't port well to the Python world. It's not that they don't work at all, just that Python usually has more straightforward (and much simpler IMHO) ways to implement common (and less common) design patterns.
wrt/ your concrete use case, you will have to post a much more detailed description, but "connecting to a 3rd part API" (I assume a REST api ?) from a Django project is so trivial that it really doesn't warrant much design considerations by itself.
In Python you can write the same as Java program structure. You don't need to be so strongly typed but you can. I'm using types when creating common classes and libraries that are used across multiple scripts.
Here you can read about Python typing
You can do the same here in Python. Define your class in package (folder) called services
Then if you want singleton you can do like that:
class Service(object):
instance = None
def __new__(cls):
if cls.instance is not None:
return cls.instance
else:
inst = cls.instance = super(Service, cls).__new__()
return inst
And now you import it wherever you want in the rest of the code
from services import Service
Service().do_action()
Adding to the answer given by bruno desthuilliers and TreantBG.
There are certain questions that you can ask about the requirements.
For example one question could be, does the api being called change with different type of objects ?
If the api doesn't change, you will probably be okay with keeping it as a method in some file or class.
If it does change, such that you are calling API 1 for some scenario, API 2 for some and so on and so forth, you will likely be better off with moving/abstracting this logic out to some class (from a better code organisation point of view).
PS: Python allows you to be as flexible as you want when it comes to code organisation. It's really upto you to decide on how you want to organise the code.
My IDE keeps suggesting I convert my instance methods to static methods. I guess because I haven't referenced any self within these methods.
An example is :
class NotificationViewSet(NSViewSet):
def pre_create_processing(self, request, obj):
log.debug(" creating messages ")
# Ensure data is consistent and belongs to the sending bot.
obj['user_id'] = request.auth.owner.id
obj['bot_id'] = request.auth.id
So my question would be: do I lose anything by just ignoring the IDE suggestions, or is there more to it?
This is a matter of workflow, intentions with your design, and also a somewhat subjective decision.
First of all, you are right, your IDE suggests converting the method to a static method because the method does not use the instance. It is most likely a good idea to follow this suggestion, but you might have a few reasons to ignore it.
Possible reasons to ignore it:
The code is soon to be changed to use the instance (on the other hand, the idea of soon is subjective, so be careful)
The code is legacy and not entirely understood/known
The interface is used in a polymorphic/duck typed way (e.g. you have a collection of objects with this method and you want to call them in a uniform way, but the implementation in this class happens to not need to use the instance - which is a bit of a code smell)
The interface is specified externally and cannot be changed (this is analog to the previous reason)
The AST of the code is read/manipulated either by itself or something that uses it and expects this method to be an instance method (this again is an external dependency on the interface)
I'm sure there can be more, but failing these types of reasons I would follow the suggestion. However, if the method does not belong to the class (e.g. factory method or something similar), I would refactor it to not be part of the class.
I think that you might be mixing up some terminology - the example is not a class method. Class methods receive the class as the first argument, they do not receive the instance. In this case you have a normal instance method that is not using its instance.
If the method does not belong in the class, you can move it out of the class and make it a standard function. Otherwise, if it should be bundled as part of the class, e.g. it's a factory function, then you should probably make it a static method as this (at a minimum) serves as useful documentation to users of your class that the method is coupled to the class, but not dependent on it's state.
Making the method static also has the advantage this it can be overridden in subclasses of the class. If the method was moved outside of the class as a regular function then subclassing is not possible.
I was just working on a large class hierarchy and thought that probably all methods in a class should be classmethods by default.
I mean that it is very rare that one needs to change the actual method for an object, and whatever variables one needs can be passed in explicitly. Also, this way there would be lesser number of methods where people could change the object itself (more typing to do it the other way), and people would be more inclined to be "functional" by default.
But, I am a newb and would like to find out the flaws in my idea (if there are any :).
Having classmethods as a default is a well-known but outdated paradigm. It's called Modular Programming. Your classes become effectively modules this way.
The Object-Oriented Paradigm (OOP) is mostly considered superior to the Modular Paradigm (and it is younger). The main difference is exactly that parts of code are associated by default to a group of data (called an object) — and thus not classmethods.
It turns out in practice that this is much more useful. Combined with other OOP architectural ideas like inheritance this offers directer ways to represent the models in the heads of the developers.
Using object methods I can write abstract code which can be used for objects of various types; I don't have to know the type of the objects while writing my routine. E. g. I can write a max() routine which compares the elements of a list with each other to find the greatest. Comparing then is done using the > operator which is effectively an object method of the element (in Python this is __gt__(), in C++ it would be operator>() etc.). Now the object itself (maybe a number, maybe a date, etc.) can handle the comparison of itself with another of its type. In code this can be written as short as
a > b # in Python this calls a.__gt__(b)
while with only having classmethods you would have to write it as
type(a).__gt__(a, b)
which is much less readable.
If the method doesn't access any of an object's state, but is specific to that object's class, then it's a good candidate for being a classmethod.
Otherwise if it's more general, then just use a function defined at module level, no need to make it belong to a specific class.
I've found that classmethods are actually pretty rare in practice, and certainly not the default. There should be plenty of good code out there (on e.g. github) to get examples from.
I realize that in most cases, it's preferred in Python to just access attributes directly, since there's no real concept of encapsulation like there is in Java and the like. However, I'm wondering if there aren't any exceptions, particularly with abstract classes that have disparate implementations.
Let's say I'm writing a bunch of abstract classes (because I am) and that they represent things having to do with version control systems like repositories and revisions (because they do). Something like an SvnRevision and an HgRevision and a GitRevision are very closely semantically linked, and I want them to be able to do the same things (so that I can have code elsewhere that acts on any kind of Repository object, and is agnostic of the subclass), which is why I want them to inherit from an abstract class. However, their implementations vary considerably.
So far, the subclasses that have been implemented share a lot of attribute names, and in a lot of code outside of the classes themselves, direct attribute access is used. For example, every subclass of Revision has an author attribute, and a date attribute, and so on. However, the attributes aren't described anywhere in the abstract class. This seems to me like a very fragile design.
If someone wants to write another implementation of the Revision class, I feel like they should be able to do so just by looking at the abstract class. However, an implementation of the class that satisfies all of the abstract methods will almost certainly fail, because the author won't know that they need attributes called 'author' and 'date' and so on, so code that tries to access Revision.author will throw an exception. Probably not hard to find the source of the problem, but irritating nonetheless, and it just feels like an inelegant design.
My solution was to write accessor methods for the abstract classes (get_id, get_author, etc.). I thought this was actually a pretty clean solution, since it eliminates arbitrary restrictions on how attributes are named and stored, and just makes clear what data the object needs to be able to access. Any class that implements all of the methods of the abstract class will work... that feels right.
Anyways, the team I'm working with hates this solution (seemingly for the reason that accessors are unpythonic, which I can't really argue with). So... what's the alternative? Documentation? Or is the problem I'm imagining a non-issue?
Note: I've considered properties, but I don't think they're a cleaner solution.
Note: I've considered properties, but I don't think they're a cleaner solution.
But they are. By using properties, you'll have the class signature you want, while being able to use the property as an attribute itself.
def _get_id(self):
return self._id
def _set_id(self, newid):
self._id = newid
Is likely similar to what you have now. To placate your team, you'd just need to add the following:
id = property(_get_id, _set_id)
You could also use property as a decorator:
#property
def id(self):
return self._id
#id.setter
def id(self, newid):
self._id = newid
And to make it readonly, just leave out set_id/the id.setter bit.
You've missed the point. It isn't the lack of encapsulation that removes the need for accessors, it's the fact that, by changing from a direct attribute to a property, you can add an accessor at a later time without changing the published interface in any way.
In many other languages, if you expose an attribute as public and then later want to wrap some code round it on access or mutation then you have to change the interface and anyone using the code has at the very least to recompile and possibly to edit their code also. Python isn't like that: you can flip flop between attribute or property just as much as you want and no code that uses the class will break.
Only behind a property.
"they should be able to do so just by looking at the abstract class"
Don't know what this should be true. A "programmer's guide", a "how to extend" document, plus some training seems appropriate to me.
"the author won't know that they need attributes called 'author' and 'date' and so on".
In that case, the documentation isn't complete. Perhaps the abstract class needs a better docstring. Or a "programmer's guide", or a "how to extend" document.
Also, it doesn't seem very difficult to (1) document these attributes in the docstring and (2) provide default values in the __init__ method.
What's wrong with providing extra support for programmers?
It sounds like you have a social problem, not a technical one. Writing code to solve a social problem seems like a waste of time and money.
The discussion already ended a year ago, but this snippet seemed telltale, it's worth discussing:
However, an implementation of the
class that satisfies all of the
abstract methods will almost certainly
fail, because the author won't know
that they need attributes called
'author' and 'date' and so on, so code
that tries to access Revision.author
will throw an exception.
Uh, something is deeply wrong. (What S. Lott said, minus the personal comments).
If these are required members, aren't they referenced (if not required) in the constructor, and defined by docstring? or at very least, as required args of methods, and again documented?
How could users of the class not know what the required members are?
To be the devil's advocate, what if your constructor(s) requires you to supply all the members that will/may be required, what issue does that cause?
Also, are you checking the parameters when passed, and throwing informative exceptions?
(The ideological argument of accessor-vs-property is a sidebar. Properties are preferable but I don't think that's the issue with your class design.)
I recently discovered metaclasses in python.
Basically a metaclass in python is a class that creates a class. There are many useful reasons why you would want to do this - any kind of class initialisation for example. Registering classes on factories, complex validation of attributes, altering how inheritance works, etc. All of this becomes not only possible but simple.
But in python, metaclasses are also plain classes. So, I started wondering if the abstraction could usefully go higher, and it seems to me that it can and that:
a metaclass corresponds to or implements a role in a pattern (as in GOF pattern languages).
a meta-metaclass is the pattern itself (if we allow it to create tuples of classes representing abstract roles, rather than just a single class)
a meta-meta-metaclass is a pattern factory, which corresponds to the GOF pattern groupings, e.g. Creational, Structural, Behavioural. A factory where you could describe a case of a certain type of problem and it would give you a set of classes that solved it.
a meta-meta-meta-metaclass (as far as I could go), is a pattern factory factory, a factory to which you could perhaps describe the type of your problem and it would give you a pattern factory to ask.
I have found some stuff about this online, but mostly not very useful. One problem is that different languages define metaclasses slightly differently.
Has anyone else used metaclasses like this in python/elsewhere, or seen this used in the wild, or thought about it? What are the analogues in other languages? E.g. in C++ how deep can the template recursion go?
I'd very much like to research it further.
This reminds me of the eternal quest some people seem to be on to make a "generic implementation of a pattern." Like a factory that can create any object (including another factory), or a general-purpose dependency injection framework that is far more complex to manage than simply writing code that actually does something.
I had to deal with people intent on abstraction to the point of navel-gazing when I was managing the Zend Framework project. I turned down a bunch of proposals to create components that didn't do anything, they were just magical implementations of GoF patterns, as though the pattern were a goal in itself, instead of a means to a goal.
There's a point of diminishing returns for abstraction. Some abstraction is great, but eventually you need to write code that does something useful.
Otherwise it's just turtles all the way down.
To answer your question: no.
Feel free to research it further.
Note, however, that you've conflated design patterns (which are just ideas) with code (which is an implementation.)
Good code often reflects a number of interlocking design patterns. There's no easy way for formalize this. The best you can do is a nice picture, well-written docstrings, and method names that reflect the various design patterns.
Also note that a meta-class is a class. That's a loop. There's no higher level of abstractions. At that point, it's just intent. The idea of meta-meta-class doesn't mean much -- it's a meta-class for meta-classes, which is silly but technically possible. It's all just a class, however.
Edit
"Are classes that create metaclasses really so silly? How does their utility suddenly run out?"
A class that creates a class is fine. That's pretty much it. The fact that the target class is a meta class or an abstract superclass or a concrete class doesn't matter. Metaclasses make classes. They might make other metaclasses, which is weird, but they're still just metaclasses making classes.
The utility "suddenly" runs out because there's no actual thing you need (or can even write) in a metaclass that makes another metaclass. It isn't that it "suddenly" becomes silly. It's that there's nothing useful there.
As I seed, feel free to research it. For example, actually write a metaclass that builds another metaclass. Have fun. There might be something useful there.
The point of OO is to write class definitions that model real-world entities. As such, a metaclass is sometimes handy to define cross-cutting aspects of several related classes. (It's a way to do some Aspect-Oriented Programming.) That's all a metaclass can really do; it's a place to hold a few functions, like __new__(), that aren't proper parts of the class itself.
During the History of Programming Languages conference in 2007, Simon Peyton Jones commented that Haskell allows meta programming using Type Classes, but that its really turtles all the way down. You can meta-meta-meta-meta etc program in Haskell, but that he's never heard of anyone using more than 3 levels of indirection.
Guy Steele pointed out that its the same thing in Lisp and Scheme. You can do meta-programming using backticks and evals (you can think of a backtick as a Python lambda, kinda), but he's never seen more than 3 backticks used.
Presumably they have seen more code than you or I ever has, so its only a slight exaggeration to say that no-one has ever gone beyond 3 levels of meta.
If you think about it, most people don't ever use meta-programming, and two levels is pretty hard to wrap your head around. I would guess that three is nearly impossible, and the that last guy to try four ended up in an asylum.
Since when I first understood metaclasses in Python, I kept wondering "what could be done with a meta-meta class?". This is at least 10 years ago - and now, just a couple months ago, it became clear for me that there is one mechanism in Python class creation that actually involves a "meta-meta" class. And therefore, it is possible to try to imagine some use for that.
To recap object instantiation in Python: Whenever one instantiates an object in Python by "calling" its class with the same syntax used for calling an ordinary function, the class's __new__ and __init__. What "orchestrates" the calling of these methods on the class is exactly the class'metaclass' __call__ method. Usually when one writes a metaclass in Python, either the __new__ or __init__ method of the metaclass is customized.
So, it turns out that by writing a "meta-meta" class one can customize its __call__ method and thus control which parameters are passed and to the metaclass's __new__ and __init__ methods, and if some other code is to be called before of after those. What turns out in the end is that metcalsses themselves are usually hardcoded and one needs just a few, if any, even in very large projects. So any customization that might be done at the "meta meta" call is usually done directly on the metaclass itself.
And them, there are those other less frequent uses for Python metaclasses - one can customize an __add__ method in a metaclass so that the classes they define are "addable", and create a derived class having the two added classes as superclasses. That mechanism is perfectly valid with metaclasses as well - therefore, so just we "have some actual code", follows an example of "meta-meta" class that allows one to compose "metaclasses" for a class just by adding them on class declaration:
class MM(type):
def __add__(cls, other):
metacls = cls.__class__
return metacls(cls.__name__ + other.__name__, (cls, other), {})
class M1(type, metaclass=MM):
def __new__(metacls, name, bases, namespace):
namespace["M1"] = "here"
print("At M1 creation")
return super().__new__(metacls, name, bases, namespace)
class M2(type, metaclass=MM):
def __new__(metacls, name, bases, namespace):
namespace["M2"] = "there"
print("At M2 creation")
return super().__new__(metacls, name, bases, namespace)
And we can see that working on the interactive console:
In [22]: class Base(metaclass = M1 + M2):
...: pass
...:
At M1 creation
At M2 creation
Note that as different metaclasses in Python are usually difficult to combine, this can actually be useful by allowing a user-made metaclass to be combined with a library's or stdlib one, without this one having to be explicitly declared as parent of the former:
In [23]: import abc
In [24]: class Combined(metaclass=M1 + abc.ABCMeta):
...: pass
...:
At M1 creation
The class system in Smalltalk is an interesting one to study. In Smalltalk, everything is an object and every object has a class. This doesn't imply that the hierarchy goes to infinity. If I remember correctly, it goes something like:
5 -> Integer -> Integer class -> Metaclass -> Metaclass class -> Metaclass -> ... (it loops)
Where '->' denotes "is an instance of".