How to design code in Python? [closed]

How to design code in Python? [closed] - python

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I'm coming from Java and learning Python. So far what I found very cool, yet very hard to adapt, is that there's no need to declare types. I understand that each variable is a pointer to an object, but so far I'm not able to understand how to design my code then.
For example, I'm writing a function that accepts a 2D NumPy array. Then in the body of the function I'm calling different methods of this array (which is an object of array in Numpy). But then in the future suppose I want to use this function, by that time I might have forgotten totally what I should pass to the function as a type. What do people normally do? Do they just write documentation for this? Because if that is the case, then this involves more typing and would raise the question about the idea of not declaring the type.
Also suppose I want to pass an object similar to an array in the future. Normally in Java one would implement an interface and then let both classes to implement the methods. Then in the function parameters I define the variable to be of the type of the interface. How can this issue be solved in Python or what approaches can be used to make the same idea?

This is a very healthy question.
Duck typing
The first thing to understand about python is the concept of duck typing:
If it walks like a duck, and quacks like a duck, then I call it a duck
Unlike Java, Python's types are never declared explicitly. There is no restriction, neither at compile time nor at runtime, in the type an object can assume.
What you do is simply treat objects as if they were of the perfect type for your needs. You don't ask or wonder about its type. If it implements the methods and attributes you want it to have, then that's that. It will do.
def foo(duck):
duck.walk()
duck.quack()
The only contract of this function is that duck exposes walk() and quack(). A more refined example:
def foo(sequence):
for item in sequence:
print item
What is sequence? A list? A numpy array? A dict? A generator? It doesn't matter. If it's iterable (that is, it can be used in a for ... in), it serves its purpose.
Type hinting
Of course, no one can live in constant fear of objects being of the wrong type. This is addressed with coding style, conventions and good documentation. For example:
A variable named count should hold an integer
A variable Foo starting with an upper-case letter should hold a type (class)
An argument bar whose default value is False, should hold a bool too when overridden
Note that the duck typing concept can be applied to to these 3 examples:
count can be any object that implements +, -, and <
Foo can be any callable that returns an object instance
bar can be any object that implements __nonzero__
In other words, the type is never defined explicitly, but always strongly hinted at. Or rather, the capabilities of the object are always hinted at, and its exact type is not relevant.
It's very common to use objects of unknown types. Most frameworks expose types that look like lists and dictionaries but aren't.
Finally, if you really need to know, there's the documentation. You'll find python documentation vastly superior to Java's. It's always worth the read.

I've reviewed a lot of Python code written by Java and .Net developers, and I've repeatedly seen a few issues I might warn/inform you about:
Python is not Java
Don't wrap everything in a class:
Seems like even the simplest function winds up being wrapped in a class when Java developers start writing Python. Python is not Java. Don't write getters and setters, that's what the property decorator is for.
I have two predicates before I consider writing classes:
I am marrying state with functionality
I expect to have multiple instances (otherwise a module level dict and functions is fine!)
Don't type-check everything
Python uses duck-typing. Refer to the data model. Its builtin type coercion is your friend.
Don't put everything in a try-except block
Only catch exceptions you know you'll get, using exceptions everywhere for control flow is computationally expensive and can hide bugs. Try to use the most specific exception you expect you might get. This leads to more robust code over the long run.
Learn the built-in types and methods, in particular:
From the data-model
str
join
just do dir(str) and learn them all.
list
append (add an item on the end of the list)
extend (extend the list by adding each item in an iterable)
dict
get (provide a default that prevents you from having to catch keyerrors!)
setdefault (set from the default or the value already there!)
fromkeys (build a dict with default values from an iterable of keys!)
set
Sets contain unique (no repitition) hashable objects (like strings and numbers). Thinking Venn diagrams? Want to know if a set of strings is in a set of other strings, or what the overlaps are (or aren't?)
union
intersection
difference
symmetric_difference
issubset
isdisjoint
And just do dir() on every type you come across to see the methods and attributes in its namespace, and then do help() on the attribute to see what it does!
Learn the built-in functions and standard library:
I've caught developers writing their own max functions and set objects. It's a little embarrassing. Don't let that happen to you!
Important modules to be aware of in the Standard Library are:
os
sys
collections
itertools
pprint (I use it all the time)
logging
unittest
re (regular expressions are incredibly efficient at parsing strings for a lot of use-cases)
And peruse the docs for a brief tour of the standard library, here's Part 1 and here's Part II. And in general, make skimming all of the docs an early goal.
Read the Style Guides:
You will learn a lot about best practices just by reading your style guides! I recommend:
PEP 8 (anything included in the standard library is written to this standard)
Google's Python Style Guide
Your firm's, if you have one.
Additionally, you can learn great style by Googling for the issue you're looking into with the phrase "best practice" and then selecting the relevant Stackoverflow answers with the greatest number of upvotes!
I wish you luck on your journey to learning Python!

For example I'm writing a function that accepts a 2D Numpy array. Then in the body of the function I'm calling different methods of this array (which is an object of array in Numpy). But then in the future suppose I want to use this function, by that time I might forgot totally what should I pass to the function as a type. What do people normally do? Do they just write a documentation for this?
You write documentation and name the function and variables appropriately.
def func(two_d_array):
do stuff
Also suppose I want in the future to pass an object similar to an array, normally in Java one would implement an interface and then let both classes to implement the methods.
You could do this. Create a base class and inherit from it, so that multiple types have the same interface. However, quite often, this is overkill and you'd simply use duck typing instead. With duck typing, all that matters is that the object being evaluated defines the right properties and methods required to use it within your code.
Note that you can check for types in Python, but this is generally considered bad practice because it prevents you from using duck typing and other coding patterns enabled by Python's dynamic type system.

Yes, you should document what type(s) of arguments your methods expect, and it's up to the caller to pass the correct type of object. Within a method, you can write code to check the types of each argument, or you can just assume it's the correct type, and rely on Python to automatically throw an exception if the passed-in object doesn't support the methods that your code needs to call on it.
The disadvantage of dynamic typing is that the computer can't do as much up-front correctness checking, as you've noted; there's a greater burden on the programmer to make sure that all arguments are of the right type. But the advantage is that you have much more flexibility in what types can be passed to your methods:
You can write a method that supports several different types of objects for a particular argument, without needing overloads and duplicated code.
Sometimes a method doesn't really care about the exact type of an object as long as it supports a particular method or operation — say, indexing with square brackets, which works on strings, arrays, and a variety of other things. In Java you'd have to create an interface, and write wrapper classes to adapt various pre-existing types to that interface. In Python you don't need to do any of that.

You can use assert to check if conditions match:
In [218]: def foo(arg):
...: assert type(arg) is np.ndarray and np.rank(arg)==2, \
...: 'the argument must be a 2D numpy array'
...: print 'good arg'
In [219]: foo(np.arange(4).reshape((2,2)))
good arg
In [220]: foo(np.arange(4))
---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
<ipython-input-220-c0ee6e33c83d> in <module>()
----> 1 foo(np.arange(4))
<ipython-input-218-63565789690d> in foo(arg)
1 def foo(arg):
2 assert type(arg) is np.ndarray and np.rank(arg)==2, \
----> 3 'the argument must be a 2D numpy array'
4 print 'good arg'
AssertionError: the argument must be a 2D numpy array
It's always better to document what you've written completely as #ChinmayKanchi mentioned.

Here are a few pointers that might help you make your approach more 'Pythonic'.
The PEPs
In general, I recommend at least browsing through the PEPs. It helped me a lot to grok Python.
Pointers
Since you mentioned the word pointers, Python doesn't use pointers to objects in the sense that C uses pointers. I am not sure about the relationship to Java. Python uses names attached to objects. It's a subtle but important difference that can cause you problems if you expect similar-to-C pointer behavior.
Duck Typing
As you said, yes, if you are expecting a certain type of input you put it in the docstring.
As zhangxaochen wrote, you can use assert to do realtime typing of your arguments, but that's not really the python way if you are doing it all the time with no particular reason. As others mentioned, it's better to test and raise a TypeError if you have to do this. Python favors duck typing instead - if you send me something that quacks like a numpy 2D array, then that's fine.

Related

Should I often create a class to indicate the type in python? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
I have to learn Python now.
I feel it is really quite opposite to strongly typed language, such as OCaml and Java, and the type thing is faded largely.
For example, when I read someone's code, I have no idea of what's the input and what's the output. Sometimes, it can be a list of dict whose value is a dict whose value is again a list.
What I have to do is to run it and print it then I know.
I haven't got used to this.
How to get used to this?
Should I create some classes just to indicate the type?
What's the general rules?

Try to focus on what you want you code to do and don't worry about the types. Is that a list? Or is it some custom class that's wrapped around a list? If you need to iterate over it then you actually don't need to know now. You care about the contents not the details of the container. So Python provides iteration which allows you to do something like:
for each in stuff:
do_something(each)
Similarly if you want to see if something is in some container than you can use:
if something in stuff:
do_something(stuff)
# or perhaps in some other cases:
something.some_method()
# or
stuff.do_it_with(something)
... and it doesn't matter whether stuff is a list, a set, a dictionary, a tuple, an SQL query result set, a DBM mapping (indexed file) etc. Those are implementation details.
When you have an object you care whether it implements the semantics you're trying to call upon. Whether that object is of a certain type ... or whether it's some sort of proxy or wrapper around some instance of that type ... or some alternative type offering the same functionality ... in Python all those are treated as irrelevant. If it provides the methods (which promise the desired semantics) then call them. If you need to handle the possibility that the object doesn't support the desired methods then wrap the call in an exception handler (try: ... except ...).
(I realize my code might look rather meaningless. It's valid code; but your question is sufficiently abstract that it's hard to give a meaningful example. The point I'm trying to make is that Python coding allows you to focus on semantics rather than the type/casting details).

Comments, doc strings and doc tests go a very long way in helping you figure out what a function expects as input/ outputs. Python 3 also added type annotations, which aren't used for anything by the interpreter, but help the reader tremendously:
def find_key_max(d:dict):
return max(d.keys())
def find_value_max(d:dict):
return max(d.values())
Just by giving functions useful names and adding type annotations I've made it fairly clear what these functions do. A comment would have been even more helpful:
def find_key_max(d:dict):
"""Finds the largest key in the dictionary d"""
return max(d.keys())
Of course, this only applies to cases where the type of the input argument really matters. The magic of Python is that often times the exact type doesn't matter. If I passed in a list instead of a dict in the method above, the error message will clearly say what the issue was. Such errors are very easy to catch with a simple smoke test.
Besides, many programmers (myself included) feel they don't need a compiler looking over their shoulder telling them if I can or cannot call MyObject.my_method. If I want to, I will, get out of my way, compiler! Trust me, I know what I'm doing. This philosophy let's you write code much faster, because you don't have to work to please a compiler. If something quacks like a duck (it has a quack method), then it is of type Duck. I don't care if the compiler thinks it is of type Chicken, it quacks and will therefore do the job.

If you really need to know what type it is use the isinstance method
if isinstance(variable, int):
# Do integer stuff here
Alternately if you want to force the input of a specific type.
def my_funciton(in):
assert isinstance(in, int), "in is not an integer!"
I recommend you document everything as always.
def my_function(in):
"""calculate some result.
Args:
in (int): This is an integer input
Return:
(int): the input variable multiplied by 2
"""
return in*2
# end my_function
There are many different ways to document the code in python. I suggest that you find something that is compatible with sphinx. There are several sphinx plugins available for different styles.

Should methods in a class be classmethod by default?

I was just working on a large class hierarchy and thought that probably all methods in a class should be classmethods by default.
I mean that it is very rare that one needs to change the actual method for an object, and whatever variables one needs can be passed in explicitly. Also, this way there would be lesser number of methods where people could change the object itself (more typing to do it the other way), and people would be more inclined to be "functional" by default.
But, I am a newb and would like to find out the flaws in my idea (if there are any :).

Having classmethods as a default is a well-known but outdated paradigm. It's called Modular Programming. Your classes become effectively modules this way.
The Object-Oriented Paradigm (OOP) is mostly considered superior to the Modular Paradigm (and it is younger). The main difference is exactly that parts of code are associated by default to a group of data (called an object) — and thus not classmethods.
It turns out in practice that this is much more useful. Combined with other OOP architectural ideas like inheritance this offers directer ways to represent the models in the heads of the developers.
Using object methods I can write abstract code which can be used for objects of various types; I don't have to know the type of the objects while writing my routine. E. g. I can write a max() routine which compares the elements of a list with each other to find the greatest. Comparing then is done using the > operator which is effectively an object method of the element (in Python this is __gt__(), in C++ it would be operator>() etc.). Now the object itself (maybe a number, maybe a date, etc.) can handle the comparison of itself with another of its type. In code this can be written as short as
a > b # in Python this calls a.__gt__(b)
while with only having classmethods you would have to write it as
type(a).__gt__(a, b)
which is much less readable.

If the method doesn't access any of an object's state, but is specific to that object's class, then it's a good candidate for being a classmethod.
Otherwise if it's more general, then just use a function defined at module level, no need to make it belong to a specific class.
I've found that classmethods are actually pretty rare in practice, and certainly not the default. There should be plenty of good code out there (on e.g. github) to get examples from.

Why does Python use built-ins instead of object methods (like ruby, etc)? [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Why does python use 'magic methods'?
Just want to clarify, this is NOT a python vs ruby. I've been a user of python for ~1 year. I like it a lot except for some of the funky built-ins vs having methods. Like why do you have to do len(listA) instead of listA.length() or listA.size() like ruby or Java. From my Java background it seems intuitive enough to have listA.length() instead of len (it's easy in an ide, etc to find out methods which an object has rather than learning the built-ins). Could anyone please explain this reason for this design choice?

Mainly due to history. Python's ancestor was a teaching language called ABC, which was, I gather, somewhat BASIC-like.
Personally, I think it's kind of nice to know that I can always get the length of something with len(thing) without having to know what the implementer of a particular type called their equivalent method: len()? length()? getLength()? You can have all the conventions you like, but when the len() built-in function breaks on your object because you didn't implement __len__(), you can be sure everyone will name their length method __len__().
Your point about IDEs auto-completing methods applies only if the IDE knows what type of object a given name is bound to, which is not always possible in a dynamic language like Python. At least, not without executing the code.
BTW, some of what you think are built-in functions (such as str(), int(), and float()) are not functions but types. (In other words, str(42) is a string construction, not a function call that returns a string.) These are perfectly cromulent even if you're an OO stickler.

Declaring types for complex data structures in python

I am quite new to python programming (C/C++ background).
I'm writing code where I need to use complex data structures like dictionaries of dictionaries of lists.
The issue is that when I must use these objects I barely remember their structure and so how to access them.
This makes it difficult to resume working on code that was untouched for days.
A very poor solution is to use comments for each variable, but that's very inflexible.
So, given that python variables are just pointers to memory and they cannot be statically type-declared, is there any convention or rule that I could follow to ease complex data structures usage?

If you use docstrings in your classes then you can use help(vargoeshere) to see how to use it.

Whatever you do, do NOT, I repeat, do NOT use Hungarian Notation! It causes severe brain & bit rot.
So, what can you do? Python and C/C++ are quite different. In C++ you typically handle polymorphic calls like so:
void doWithFooThing(FooThing *foo) {
foo->bar();
}
Dynamic polymorphism in C++ depends on inheritance: the pointer passed to doWithFooThing may point only to instances of FooThing or one of its subclasses. Not so in Python:
def do_with_fooish(fooish):
fooish.bar()
Here, any sufficiently fooish thing (i.e. everything that has a callable bar attribute) can be used, no matter how it is releated to any other fooish thing through inheritance.
The point here is, in C++ you know what (base-)type every object has, whereas in Python you don't, and you don't care. What you try to achieve in Python is code that is reusable in as many situations as possible without having to force everthing under the rigid rule of class inheritance. Your naming should also reflect that. You dont write:
def some_action(a_list):
...
but:
def some_action(seq):
...
where seq might be not only a list, but any iterable sequence, be it list, tuple, dict, set, iterator, whatever.
In general, you put emphasis on the intent of your code, instead of its the type structure. Instead of writing:
dict_of_strings_to_dates = {}
you write:
users_birthdays = {}
It also helps to keep functions short, even more so than in C/C++. Then you'll be easily able to see what's going on.
Another thing: you shouldn't think of Python variables as pointers to memory. They're in fact dicionary entries:
assert foo.bar == getattr(foo, 'bar') == foo.__dict__['bar']
Not always exactly so, I concur, but the details can be looked up at docs.python.org.
And, BTW, in Python you don't declare stuff like you do in C/C++. You just define stuff.

I believe you should take a good look some of your complex structures, what you are doing with them, and ask... Is This Pythonic? Ask here on SO. I think you will find some cases where the complexity is an artifact of C/C++.

Include an example somewhere in your code, or in your tests.

What to consider before subclassing list?

I was recently going over a coding problem I was having and someone looking at the code said that subclassing list was bad (my problem was unrelated to that class). He said that you shouldn't do it and that it came with a bunch of bad side effects. Is this true?
I'm asking if list is generally bad to subclass and if so, what are the reasons. Alternately, what should I consider before subclassing list in Python?

The abstract base classes provided in the collections module, particularly MutableSequence, can be useful when implementing list-like classes. These are available in Python 2.6 and later.
With ABCs you can implement the "core" functionality of your class and it will provide the methods which logically depend on what you've defined.
For example, implementing __getitem__ in a collections.Sequence-derived class will be enough to provide your class with __contains__, __iter__, and other methods.
You may still want to use a contained list object to do the heavy lifting.

There are no benefits to subclassing list. None of the methods will use any methods you override, so you can have unexpected bugs. Further, it's very often confusing doing things like self.append instead of self.foos.append or especially self[4] rather than self.foos[4] to access your data. You can make something that works exactly like a list or (better) howevermuch like a list you really want while just subclassing object.

I think the first question I'd ask myself is, "Is my new object really a list?". Does it walk like a list, talk like a list? Or is is something else?
If it is a list, then all the standard list methods should all make sense.
If the standard list methods don't make sense, then your object should contain a list, not be a list.
In old python (2.2?) sub-classing list was a bad idea for various technical reasons, but in a modern python it is fine.

Nick is correct.
Also, while I can't speak to Python, in other OO languages (Java, Smalltalk) subclassing a list is a bad idea. Inheritance in general should be avoided and delegation-composition used instead.
Rather, you make a container class and delegate calls to the list. The container class has a reference to the list and you can even expose the calls and returns of the list in your own methods.
This adds flexibility and allows you to change the implementation (a different list type or data structure) later w/o breaking any code. If you want your list to do different listy-type things then your container can do this and use the plain list as a simple data structure.
Imagine if you had 47 different uses of lists. Do you really want to maintain 47 different subclasses?
Instead you could do this via the container and interfaces. One class to maintain and allow people to call your new and improved methods via the interface(s) with the implementation remaining hidden.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.