What to consider before subclassing list?

What to consider before subclassing list? - python

I was recently going over a coding problem I was having and someone looking at the code said that subclassing list was bad (my problem was unrelated to that class). He said that you shouldn't do it and that it came with a bunch of bad side effects. Is this true?
I'm asking if list is generally bad to subclass and if so, what are the reasons. Alternately, what should I consider before subclassing list in Python?

The abstract base classes provided in the collections module, particularly MutableSequence, can be useful when implementing list-like classes. These are available in Python 2.6 and later.
With ABCs you can implement the "core" functionality of your class and it will provide the methods which logically depend on what you've defined.
For example, implementing __getitem__ in a collections.Sequence-derived class will be enough to provide your class with __contains__, __iter__, and other methods.
You may still want to use a contained list object to do the heavy lifting.

There are no benefits to subclassing list. None of the methods will use any methods you override, so you can have unexpected bugs. Further, it's very often confusing doing things like self.append instead of self.foos.append or especially self[4] rather than self.foos[4] to access your data. You can make something that works exactly like a list or (better) howevermuch like a list you really want while just subclassing object.

I think the first question I'd ask myself is, "Is my new object really a list?". Does it walk like a list, talk like a list? Or is is something else?
If it is a list, then all the standard list methods should all make sense.
If the standard list methods don't make sense, then your object should contain a list, not be a list.
In old python (2.2?) sub-classing list was a bad idea for various technical reasons, but in a modern python it is fine.

Nick is correct.
Also, while I can't speak to Python, in other OO languages (Java, Smalltalk) subclassing a list is a bad idea. Inheritance in general should be avoided and delegation-composition used instead.
Rather, you make a container class and delegate calls to the list. The container class has a reference to the list and you can even expose the calls and returns of the list in your own methods.
This adds flexibility and allows you to change the implementation (a different list type or data structure) later w/o breaking any code. If you want your list to do different listy-type things then your container can do this and use the plain list as a simple data structure.
Imagine if you had 47 different uses of lists. Do you really want to maintain 47 different subclasses?
Instead you could do this via the container and interfaces. One class to maintain and allow people to call your new and improved methods via the interface(s) with the implementation remaining hidden.

Related

How to design code in Python? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I'm coming from Java and learning Python. So far what I found very cool, yet very hard to adapt, is that there's no need to declare types. I understand that each variable is a pointer to an object, but so far I'm not able to understand how to design my code then.
For example, I'm writing a function that accepts a 2D NumPy array. Then in the body of the function I'm calling different methods of this array (which is an object of array in Numpy). But then in the future suppose I want to use this function, by that time I might have forgotten totally what I should pass to the function as a type. What do people normally do? Do they just write documentation for this? Because if that is the case, then this involves more typing and would raise the question about the idea of not declaring the type.
Also suppose I want to pass an object similar to an array in the future. Normally in Java one would implement an interface and then let both classes to implement the methods. Then in the function parameters I define the variable to be of the type of the interface. How can this issue be solved in Python or what approaches can be used to make the same idea?

This is a very healthy question.
Duck typing
The first thing to understand about python is the concept of duck typing:
If it walks like a duck, and quacks like a duck, then I call it a duck
Unlike Java, Python's types are never declared explicitly. There is no restriction, neither at compile time nor at runtime, in the type an object can assume.
What you do is simply treat objects as if they were of the perfect type for your needs. You don't ask or wonder about its type. If it implements the methods and attributes you want it to have, then that's that. It will do.
def foo(duck):
duck.walk()
duck.quack()
The only contract of this function is that duck exposes walk() and quack(). A more refined example:
def foo(sequence):
for item in sequence:
print item
What is sequence? A list? A numpy array? A dict? A generator? It doesn't matter. If it's iterable (that is, it can be used in a for ... in), it serves its purpose.
Type hinting
Of course, no one can live in constant fear of objects being of the wrong type. This is addressed with coding style, conventions and good documentation. For example:
A variable named count should hold an integer
A variable Foo starting with an upper-case letter should hold a type (class)
An argument bar whose default value is False, should hold a bool too when overridden
Note that the duck typing concept can be applied to to these 3 examples:
count can be any object that implements +, -, and <
Foo can be any callable that returns an object instance
bar can be any object that implements __nonzero__
In other words, the type is never defined explicitly, but always strongly hinted at. Or rather, the capabilities of the object are always hinted at, and its exact type is not relevant.
It's very common to use objects of unknown types. Most frameworks expose types that look like lists and dictionaries but aren't.
Finally, if you really need to know, there's the documentation. You'll find python documentation vastly superior to Java's. It's always worth the read.

I've reviewed a lot of Python code written by Java and .Net developers, and I've repeatedly seen a few issues I might warn/inform you about:
Python is not Java
Don't wrap everything in a class:
Seems like even the simplest function winds up being wrapped in a class when Java developers start writing Python. Python is not Java. Don't write getters and setters, that's what the property decorator is for.
I have two predicates before I consider writing classes:
I am marrying state with functionality
I expect to have multiple instances (otherwise a module level dict and functions is fine!)
Don't type-check everything
Python uses duck-typing. Refer to the data model. Its builtin type coercion is your friend.
Don't put everything in a try-except block
Only catch exceptions you know you'll get, using exceptions everywhere for control flow is computationally expensive and can hide bugs. Try to use the most specific exception you expect you might get. This leads to more robust code over the long run.
Learn the built-in types and methods, in particular:
From the data-model
str
join
just do dir(str) and learn them all.
list
append (add an item on the end of the list)
extend (extend the list by adding each item in an iterable)
dict
get (provide a default that prevents you from having to catch keyerrors!)
setdefault (set from the default or the value already there!)
fromkeys (build a dict with default values from an iterable of keys!)
set
Sets contain unique (no repitition) hashable objects (like strings and numbers). Thinking Venn diagrams? Want to know if a set of strings is in a set of other strings, or what the overlaps are (or aren't?)
union
intersection
difference
symmetric_difference
issubset
isdisjoint
And just do dir() on every type you come across to see the methods and attributes in its namespace, and then do help() on the attribute to see what it does!
Learn the built-in functions and standard library:
I've caught developers writing their own max functions and set objects. It's a little embarrassing. Don't let that happen to you!
Important modules to be aware of in the Standard Library are:
os
sys
collections
itertools
pprint (I use it all the time)
logging
unittest
re (regular expressions are incredibly efficient at parsing strings for a lot of use-cases)
And peruse the docs for a brief tour of the standard library, here's Part 1 and here's Part II. And in general, make skimming all of the docs an early goal.
Read the Style Guides:
You will learn a lot about best practices just by reading your style guides! I recommend:
PEP 8 (anything included in the standard library is written to this standard)
Google's Python Style Guide
Your firm's, if you have one.
Additionally, you can learn great style by Googling for the issue you're looking into with the phrase "best practice" and then selecting the relevant Stackoverflow answers with the greatest number of upvotes!
I wish you luck on your journey to learning Python!

For example I'm writing a function that accepts a 2D Numpy array. Then in the body of the function I'm calling different methods of this array (which is an object of array in Numpy). But then in the future suppose I want to use this function, by that time I might forgot totally what should I pass to the function as a type. What do people normally do? Do they just write a documentation for this?
You write documentation and name the function and variables appropriately.
def func(two_d_array):
do stuff
Also suppose I want in the future to pass an object similar to an array, normally in Java one would implement an interface and then let both classes to implement the methods.
You could do this. Create a base class and inherit from it, so that multiple types have the same interface. However, quite often, this is overkill and you'd simply use duck typing instead. With duck typing, all that matters is that the object being evaluated defines the right properties and methods required to use it within your code.
Note that you can check for types in Python, but this is generally considered bad practice because it prevents you from using duck typing and other coding patterns enabled by Python's dynamic type system.

Yes, you should document what type(s) of arguments your methods expect, and it's up to the caller to pass the correct type of object. Within a method, you can write code to check the types of each argument, or you can just assume it's the correct type, and rely on Python to automatically throw an exception if the passed-in object doesn't support the methods that your code needs to call on it.
The disadvantage of dynamic typing is that the computer can't do as much up-front correctness checking, as you've noted; there's a greater burden on the programmer to make sure that all arguments are of the right type. But the advantage is that you have much more flexibility in what types can be passed to your methods:
You can write a method that supports several different types of objects for a particular argument, without needing overloads and duplicated code.
Sometimes a method doesn't really care about the exact type of an object as long as it supports a particular method or operation — say, indexing with square brackets, which works on strings, arrays, and a variety of other things. In Java you'd have to create an interface, and write wrapper classes to adapt various pre-existing types to that interface. In Python you don't need to do any of that.

You can use assert to check if conditions match:
In [218]: def foo(arg):
...: assert type(arg) is np.ndarray and np.rank(arg)==2, \
...: 'the argument must be a 2D numpy array'
...: print 'good arg'
In [219]: foo(np.arange(4).reshape((2,2)))
good arg
In [220]: foo(np.arange(4))
---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
<ipython-input-220-c0ee6e33c83d> in <module>()
----> 1 foo(np.arange(4))
<ipython-input-218-63565789690d> in foo(arg)
1 def foo(arg):
2 assert type(arg) is np.ndarray and np.rank(arg)==2, \
----> 3 'the argument must be a 2D numpy array'
4 print 'good arg'
AssertionError: the argument must be a 2D numpy array
It's always better to document what you've written completely as #ChinmayKanchi mentioned.

Here are a few pointers that might help you make your approach more 'Pythonic'.
The PEPs
In general, I recommend at least browsing through the PEPs. It helped me a lot to grok Python.
Pointers
Since you mentioned the word pointers, Python doesn't use pointers to objects in the sense that C uses pointers. I am not sure about the relationship to Java. Python uses names attached to objects. It's a subtle but important difference that can cause you problems if you expect similar-to-C pointer behavior.
Duck Typing
As you said, yes, if you are expecting a certain type of input you put it in the docstring.
As zhangxaochen wrote, you can use assert to do realtime typing of your arguments, but that's not really the python way if you are doing it all the time with no particular reason. As others mentioned, it's better to test and raise a TypeError if you have to do this. Python favors duck typing instead - if you send me something that quacks like a numpy 2D array, then that's fine.

Should methods in a class be classmethod by default?

I was just working on a large class hierarchy and thought that probably all methods in a class should be classmethods by default.
I mean that it is very rare that one needs to change the actual method for an object, and whatever variables one needs can be passed in explicitly. Also, this way there would be lesser number of methods where people could change the object itself (more typing to do it the other way), and people would be more inclined to be "functional" by default.
But, I am a newb and would like to find out the flaws in my idea (if there are any :).

Having classmethods as a default is a well-known but outdated paradigm. It's called Modular Programming. Your classes become effectively modules this way.
The Object-Oriented Paradigm (OOP) is mostly considered superior to the Modular Paradigm (and it is younger). The main difference is exactly that parts of code are associated by default to a group of data (called an object) — and thus not classmethods.
It turns out in practice that this is much more useful. Combined with other OOP architectural ideas like inheritance this offers directer ways to represent the models in the heads of the developers.
Using object methods I can write abstract code which can be used for objects of various types; I don't have to know the type of the objects while writing my routine. E. g. I can write a max() routine which compares the elements of a list with each other to find the greatest. Comparing then is done using the > operator which is effectively an object method of the element (in Python this is __gt__(), in C++ it would be operator>() etc.). Now the object itself (maybe a number, maybe a date, etc.) can handle the comparison of itself with another of its type. In code this can be written as short as
a > b # in Python this calls a.__gt__(b)
while with only having classmethods you would have to write it as
type(a).__gt__(a, b)
which is much less readable.

If the method doesn't access any of an object's state, but is specific to that object's class, then it's a good candidate for being a classmethod.
Otherwise if it's more general, then just use a function defined at module level, no need to make it belong to a specific class.
I've found that classmethods are actually pretty rare in practice, and certainly not the default. There should be plenty of good code out there (on e.g. github) to get examples from.

Best practice - accessing object variables

I've been making a lot of classes an Python recently and I usually just access instance variables like this:
object.variable_name
But often I see that objects from other modules will make wrapper methods to access variables like this:
object.getVariable()
What are the advantages/disadvantages to these different approaches and is there a generally accepted best practice (even if there are exceptions)?

There should never be any need in Python to use a method call just to get an attribute. The people who have written this are probably ex-Java programmers, where that is idiomatic.
In Python, it's considered proper to access the attribute directly.
If it turns out that you need some code to run when accessing the attribute, for instance to calculate it dynamically, you should use the #property decorator.

The main advantages of "getters" (the getVariable form) in my modest opinion is that it's much easier to add functionality or evolve your objects without changing the signatures.
For instance, let's say that my object changes from implementing some functionality to encapsulating another object and providing the same functionality via Proxy Pattern (composition). If I'm using getters to access the properties, it doesn't matter where that property is being fetched from, and no change whatsoever is visible to the "clients" using your code.
I use getters and such methods especially when my code is being reused (as a library for instance), by others. I'm much less picky when my code is self-contained.
In Java this is almost a requirement, you should never access your object fields directly. In Python it's perfectly legitimate to do so, but you may take in consideration the possible benefits of encapsulation that I mentioned. Still keep in mind that direct access is not considered bad form in Python, on the contrary.

making getVariable() and setVariable() methods is called enncapsulation.
There are many advantages to this practice and it is the preffered style in object-oriented programming.
By accessing your variables through methods you can add another layer of "error checking/handling" by making sure the value you are trying to set/get is correct.
The setter method is also used for other tasks like notifying listeners that the variable have changed.
At least in java/c#/c++ and so on.

What is the equivalent of passing functions as arguments using an object oriented approach

I have a program in python that includes a class that takes a function as an argument to the __init__ method. This function is stored as an attribute and used in various places within the class. The functions passed in can be quite varied, and passing in a key and then selecting from a set of predefined functions would not give the same degree of flexibility.
Now, apologies if a long list of questions like this is not cool, but...
Is their a standard way to achieve this in a language where functions aren't first class objects?
Do blocks, like in smalltalk or objective-C, count as functions in this respect?
Would blocks be the best way to do this in those languages?
What if there are no blocks?
Could you add a new method at runtime?
In which languages would this be possible (and easy)?
Or would it be better to create an object with a single method that performs the desired operation?
What if I wanted to pass lots of functions, would I create lots of singleton objects?
Would this be considered a more object oriented approach?
Would anyone consider doing this in python, where functions are first class objects?

I don't understand what you mean by "equivalent... using an object oriented approach". In Python, since functions are (as you say) first-class objects, how is it not "object-oriented" to pass functions as arguments?
a standard way to achieve this in a language where functions aren't first class objects?
Only to the extent that there is a standard way of functions failing to be first-class objects, I would say.
In C++, it is common to create another class, often called a functor or functionoid, which defines an overload for operator(), allowing instances to be used like functions syntactically. However, it's also often possible to get by with plain old function-pointers. Neither the pointer nor the pointed-at function is a first-class object, but the interface is rich enough.
This meshes well with "ad-hoc polymorphism" achieved through templates; you can write functions that don't actually care whether you pass an instance of a class or a function pointer.
Similarly, in Python, you can make objects register as callable by defining a __call__ method for the class.
Do blocks, like in smalltalk or objective-C, count as functions in this respect?
I would say they do. At least as much as lambdas count as functions in Python, and actually more so because they aren't crippled the way Python's lambdas are.
Would blocks be the best way to do this in those languages?
It depends on what you need.
Could you add a new method at runtime? In which languages would this be possible (and easy)?
Languages that offer introspection and runtime access to their own compiler. Python qualifies.
However, there is nothing about the problem, as presented so far, which suggests a need to jump through such hoops. Of course, some languages have more required boilerplate than others for a new class.
Or would it be better to create an object with a single method that performs the desired operation?
That is pretty standard.
What if I wanted to pass lots of functions, would I create lots of singleton objects?
You say this as if you might somehow accidentally create more than one instance of the class if you don't write tons of boilerplate in an attempt to prevent yourself from doing so.
Would this be considered a more object oriented approach?
Again, I can't fathom your understanding of the term "object-oriented". It doesn't mean "creating lots of objects".
Would anyone consider doing this in python, where functions are first class objects?
Not without a need for the extra things that a class can do and a function can't. With duck typing, why on earth would you bother?

I'm just going to answer some of your questions.
As they say in the Scheme community, "objects are a poor man's closures" (closures being first-class functions). Blocks are usually just syntactic sugar for closures. For languages that do not have closures, there exist various solutions.
One of the common solutions is to use operator overloading: C++ has a notion of function objects, which define a member operator() ("operator function call"). Python has a similar overloading mechanism, where you define __call__:
class Greeter(object):
def __init__(self, who):
self.who = who
def __call__(self):
print("Hello, %s!" % who)
hello = Greeter("world")
hello()
Yes, you might consider using this in Python instead of storing functions in objects, since functions can't be pickled.
In languages without operator overloading, you'll see things like Guava's Function interface.

You could use the strategy pattern. Basically you pass in an object with a known interface, but different behavior. It's like passing function but one that's wrapped up in an object.

In Smalltalk you'd mostly be using blocks. You can also create classes and instances at runtime.

Declaring types for complex data structures in python

I am quite new to python programming (C/C++ background).
I'm writing code where I need to use complex data structures like dictionaries of dictionaries of lists.
The issue is that when I must use these objects I barely remember their structure and so how to access them.
This makes it difficult to resume working on code that was untouched for days.
A very poor solution is to use comments for each variable, but that's very inflexible.
So, given that python variables are just pointers to memory and they cannot be statically type-declared, is there any convention or rule that I could follow to ease complex data structures usage?

If you use docstrings in your classes then you can use help(vargoeshere) to see how to use it.

Whatever you do, do NOT, I repeat, do NOT use Hungarian Notation! It causes severe brain & bit rot.
So, what can you do? Python and C/C++ are quite different. In C++ you typically handle polymorphic calls like so:
void doWithFooThing(FooThing *foo) {
foo->bar();
}
Dynamic polymorphism in C++ depends on inheritance: the pointer passed to doWithFooThing may point only to instances of FooThing or one of its subclasses. Not so in Python:
def do_with_fooish(fooish):
fooish.bar()
Here, any sufficiently fooish thing (i.e. everything that has a callable bar attribute) can be used, no matter how it is releated to any other fooish thing through inheritance.
The point here is, in C++ you know what (base-)type every object has, whereas in Python you don't, and you don't care. What you try to achieve in Python is code that is reusable in as many situations as possible without having to force everthing under the rigid rule of class inheritance. Your naming should also reflect that. You dont write:
def some_action(a_list):
...
but:
def some_action(seq):
...
where seq might be not only a list, but any iterable sequence, be it list, tuple, dict, set, iterator, whatever.
In general, you put emphasis on the intent of your code, instead of its the type structure. Instead of writing:
dict_of_strings_to_dates = {}
you write:
users_birthdays = {}
It also helps to keep functions short, even more so than in C/C++. Then you'll be easily able to see what's going on.
Another thing: you shouldn't think of Python variables as pointers to memory. They're in fact dicionary entries:
assert foo.bar == getattr(foo, 'bar') == foo.__dict__['bar']
Not always exactly so, I concur, but the details can be looked up at docs.python.org.
And, BTW, in Python you don't declare stuff like you do in C/C++. You just define stuff.

I believe you should take a good look some of your complex structures, what you are doing with them, and ask... Is This Pythonic? Ask here on SO. I think you will find some cases where the complexity is an artifact of C/C++.

Include an example somewhere in your code, or in your tests.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

What to consider before subclassing list? - python

Related

How to design code in Python? [closed]

Should methods in a class be classmethod by default?

Best practice - accessing object variables

What is the equivalent of passing functions as arguments using an object oriented approach

Declaring types for complex data structures in python

Categories

Resources