exposing or hiding objects of dependencies?

exposing or hiding objects of dependencies? - python

Common scenario: I have a library that uses other libraries. For example, a math library (let's call it foo) that uses numpy.
Functions of foo can either:
return a numpy object (either pure or an inherited reimplementation)
return a list
return a foo-implemented object that behaves like numpy (performing delegation)
The three solutions can be also restated as:
foo passes through the internally used object, clearly stating that its library dependency is also a API dependency (since it returns objects obeying the interface of the numpy library)
foo makes use of a common subset of objects that are part of the basis of the language.
foo completely hides what it uses internally. Nothing about the underlying libraries escapes from the foo library to the client code.
We are of course in a pros-cons scenario. Transparent or opaque? strong coupling with the underlying tools or not? I know the drill but I am in the process of having to do this choice, and I want to share opinions before taking a decision. Suggestions, ideas, personal experience are greatly appreciated.

Since you're talking about return values, that's not really about "internal objects" -- you should just document the interfaces your returned objects will support (it's OK if that's a subset of numpy.array or whatever;-). I recommend against returning a reference to your internal mutable attributes and documenting that mutators work to alter your own object indirectly (and NOT documenting it is not much better) -- that leads to way-too-strong coupling down the road.
If you WERE talking about actual internal objects, I'd recommend the Law of Demeter -- in a simplistic reading, if the client's coding a.b.c.d.e.f(), then something is very wrong ("just one dot" may be sometimes extreme, but, "four are Right Out"). Again, the problem is strong coupling -- making it impossible for you to change your internal implementation in even minor ways without breaking a million clients...!

The main question I would think about is how much of your library would return numpy objects? If its pervasive I would go with directly returning a numpy as your so tied to numpy you might as well make it explicit. Plus it will probably make it easier to use other numpy based libraries. If on the other hand you only have a few methods that would return numpy I would either go with the numpy like object or a list, probably the numpy like object.

Related

C++ shared_ptr vs. Python object

AFAIK, the use of shared_ptr is often discouraged because of potential bugs caused by careless usage of them (unless you have a really good explanation for significant benefit and carefully checked design).
On the other hand, Python objects seem to be essentially shared_ptrs (ref_count and garbage collection).
I am wondering what makes them work nicely in Python but potentially dangerous in C++. In other words, what are the differences between Python and C++ in dealing with shared_ptr that makes their usage discouraged in C++ but not causing similar problems in Python?
I know e.g. Python automatically detects cycles between objects which prevents memory leaks that dangling cyclic shared_ptrs can cause in C++.

"I know e.g. Python automatically detects cycles" -- that's what makes them work nicely, at least so far as the "potential bugs" relate to memory leaks.
Besides which, C++ programs are more commonly written under tight performance constraints than Python programs (which IMO is a combination of different genuine requirements with some fairly bogus differences in rules-of-thumb, but that's another story). A fairly high proportion of the Python objects I use don't strictly need reference counting, they have exactly one owner and a unique_ptr would be fine (or for that matter a data member of class type). In C++ it's considered (by the people writing the advice you're reading) worth taking the performance advantage and the explicitly simplified design. In Python it's usually not considered a problem, you pay the performance and you keep the flexibility to decide later that it's shared after all without any code change required (other than to take additional references that outlive the original, I mean).
Btw in any language, shared mutable objects have "potential bugs" associated with them, if you lose track of what objects will or won't change when you're not looking at them. I don't just mean race conditions: even in a single-threaded program you need to be aware that C++ Predicates shouldn't change anything and that you (often) can't mutate a container while iterating over it. I don't see this as a difference between C++ and Python, though. Rather, to some extent you should be slightly wary of shared objects in Python too, and when you proliferate references to an object at least understand why you're doing it.
So, on to the list of issues in the question you link to:
cyclic references -- as mentioned, Python rolls its sleeves up, finds them and frees them. For reasons to do with the design and specific uses of the languages, cycle-breaking garbage collection is rather difficult to implement in C++, although not impossible.
creating multiple unrelated shared_ptrs to the same object -- no analog is possible in Python, since the reference-counter isn't open to the user to mess up.
Constructing an anonymous temporary shared pointer -- doesn't arise in Python, there's no risk of a memory leak that way in Python since there's no "gap" in which the object exists but is not yet subject to collection if it becomes unreferenced.
Calling the get() function to get the raw pointer and use it after the pointed-to object goes out of scope -- well, you can mess this up if you're writing Python/C, but not in pure Python.
Passing a reference of or a raw pointer to a shared_ptr should be dangerous too, since it won't increment the internal count -- there's no means in Python to add a reference without the language taking care of the refcount.
we passed 'this' to some thread workers instead of 'shared_from_this' -- in other words, forgot to create a shared_ptr when needed. Can't do this in Python.
most of the predicates you know and love from <functional> don't play nicely with shared_ptr -- Python refcounting is so built in to the runtime (or I suppose to be precise I should say: garbage collection is so built in to the language design) that there are no libraries that fail to cope with it.
Using shared_ptr for really small objects (like char short) could be an overhead -- issue exists in Python, and Python programmers generally don't sweat it. If you need an array of "primitive type" then you can use numpy to reduce overhead. Sometimes Python programs run out of memory and you need to do something about it, that's life ;-)
Giving out a shared_ptr< T > to this inside a class definition is also dangerous. Use enabled_shared_from_this instead -- it may not be obvious, but this is "don't create multiple unrelated shared_ptr to the same object" again.
You need to be careful when you use shared_ptr in multithread code -- it's possible to create race conditions in Python too, this is part of "shared mutable objects are tricksy".
Most of this is to do with the fact that in C++ you have to explicitly do something to get refcounting, and you don't get it if you don't ask for it. This provides several opportunities for error that Python doesn't make available to the programmer because it just does it for you. If you use shared_ptr correctly then apart from the existence of libraries that don't co-operate with it, none of these problems comes up in C++ either. Those who are cautious of using it for these reasons are basically saying they're afraid they'll use it incorrectly, or at any rate more afraid than that they'll misuse some alternative. Much of C++ programming is trading different potential bugs off against each other until you come up with a design that you consider yourself competent to execute. Furthermore it has "don't pay for what you don't need" as a design philosophy. Between these two factors, you don't do anything without a really good explanation, a significant benefit, and a carefully checked design. shared_ptr is no different ;-)

AFAIK, the use of shared_ptr is often discouraged because of potential bugs caused by careless usage of them (unless you have a really good explanation for significant benefit and carefully checked design).
I wouldn't agree. The tendency goes towards generally using these smart pointers unless you have a very good reasons not to do so.
shared_ptr that makes their usage discouraged in C++ but not causing similar problems in Python?
Well, I don't know about your favourite largish signal processing framework ecosystem, but GNU Radio uses shared_ptrs for all their blocks, which are the core elements of the GNU Radio architecture. In fact, blocks are classes, with private constructors, which are only accessible by a friend make function, which returns a shared_ptr. We haven't had problems with this -- and GNU Radio had good reason to adopt such a model. Now, we don't have a single place where users try to use deallocated block objects, not a single block is leaked. Nice!
Also, we use SWIG and a gateway class for a few C++ types that can't just be represented well as Python types. All this works very well on both sides, C++ and Python. In fact, it works so very well, that we can use Python classes as blocks in the C++ runtime, wrapped in shared_ptr.
Also, we never had performance problems. GNU Radio is a high rate, highly optimized, heavily multithreaded framework.

Is it Pythonic to use objects wherever possible?

Newbie Python question here - I am writing a little utility in Python to do disk space calculations when given the attributes of 2 different files.
Should I create a 'file' class with methods appropriate to the conversion and then create each file as an instance of that class? I'm pretty new to Python, but ok with Perl, and I believe that in Perl (I may be wrong, being self-taught), from the examples that I have seen, that most Perl is not OO.
Background info - These are IBM z/OS (mainframe) data sets, and when given the allocation attributes for a file on a specific disk type and file organisation (it's block size) and then given the allocation parameters for a different disk type & organisation, the space requirements can vary enormously.

Definition nitpicking preface: Everything in Python is technically an object, even functions and numbers. I'm going to assume you mean classes vs. functions in your question.
Actually I think one of the great things about Python is that it doesn't embrace classes for absolutely everything as some other languages (e.g., Java and C#).
It's perfectly acceptable in Python (and the built-in modules do this a lot) to define module level functions rather than encapsulating all logic in objects.
That said, classes do have their place, for example when you perform multiple actions on a single piece of data, and especially when these actions change the data and you want to keep its state encapsulated.

For Your Question and you requirements ..a short answer is "No"

The use of objects is not in itself "object oriented". Functional programming uses objects, too, just not in the same way. Python can be used in a very FP way, even though Python uses objects heavily behind the scenes.
Overuse of primitives can be a problem, but it's impossible to say whether that applies to your case without more data.
I think of OO as an interface design approach: If you are creating tools that are straightforward to interact with (and substitutable) as objects with predictable methods, then by all means, create objects. But if the interactions are straightforward to describe with module-level functions, then don't try too hard to engineer your code into classes.

First and foremost - Pythonic is a term that needs to disappear, preferably with everyone who uses it. It doesn't mean anything and it's used by people who can't use reason to justify anything, so they need a mandatory term to justify their nonsense.
But to the point you never HAVE to use object oriented concepts in your software development, as everything OOP can as easily be written with functions and solid spaghetti stringers. But the question is - do use of objects makes sense in my solution?
To understand when and how to use it, you have to ask what exactly is object oriented programming. And this was already very well explained in very old, but also free, book called Thinking in java which I consider to be the 101 bible of thinking on OO terms. I strongly urge you to grab a free copy and read couple of first chapters.
Because if you don't understand the object oriented approach, how can you apply it properly? When you do - then when to use it, or not use it, becomes clear, because you can clearly translate real life items and interactions into abstract objects. And this is the guideline - when the translation of given action, item or data to OOP model is straightforward and logical - then you should do it.

How to design code in Python? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I'm coming from Java and learning Python. So far what I found very cool, yet very hard to adapt, is that there's no need to declare types. I understand that each variable is a pointer to an object, but so far I'm not able to understand how to design my code then.
For example, I'm writing a function that accepts a 2D NumPy array. Then in the body of the function I'm calling different methods of this array (which is an object of array in Numpy). But then in the future suppose I want to use this function, by that time I might have forgotten totally what I should pass to the function as a type. What do people normally do? Do they just write documentation for this? Because if that is the case, then this involves more typing and would raise the question about the idea of not declaring the type.
Also suppose I want to pass an object similar to an array in the future. Normally in Java one would implement an interface and then let both classes to implement the methods. Then in the function parameters I define the variable to be of the type of the interface. How can this issue be solved in Python or what approaches can be used to make the same idea?

This is a very healthy question.
Duck typing
The first thing to understand about python is the concept of duck typing:
If it walks like a duck, and quacks like a duck, then I call it a duck
Unlike Java, Python's types are never declared explicitly. There is no restriction, neither at compile time nor at runtime, in the type an object can assume.
What you do is simply treat objects as if they were of the perfect type for your needs. You don't ask or wonder about its type. If it implements the methods and attributes you want it to have, then that's that. It will do.
def foo(duck):
duck.walk()
duck.quack()
The only contract of this function is that duck exposes walk() and quack(). A more refined example:
def foo(sequence):
for item in sequence:
print item
What is sequence? A list? A numpy array? A dict? A generator? It doesn't matter. If it's iterable (that is, it can be used in a for ... in), it serves its purpose.
Type hinting
Of course, no one can live in constant fear of objects being of the wrong type. This is addressed with coding style, conventions and good documentation. For example:
A variable named count should hold an integer
A variable Foo starting with an upper-case letter should hold a type (class)
An argument bar whose default value is False, should hold a bool too when overridden
Note that the duck typing concept can be applied to to these 3 examples:
count can be any object that implements +, -, and <
Foo can be any callable that returns an object instance
bar can be any object that implements __nonzero__
In other words, the type is never defined explicitly, but always strongly hinted at. Or rather, the capabilities of the object are always hinted at, and its exact type is not relevant.
It's very common to use objects of unknown types. Most frameworks expose types that look like lists and dictionaries but aren't.
Finally, if you really need to know, there's the documentation. You'll find python documentation vastly superior to Java's. It's always worth the read.

I've reviewed a lot of Python code written by Java and .Net developers, and I've repeatedly seen a few issues I might warn/inform you about:
Python is not Java
Don't wrap everything in a class:
Seems like even the simplest function winds up being wrapped in a class when Java developers start writing Python. Python is not Java. Don't write getters and setters, that's what the property decorator is for.
I have two predicates before I consider writing classes:
I am marrying state with functionality
I expect to have multiple instances (otherwise a module level dict and functions is fine!)
Don't type-check everything
Python uses duck-typing. Refer to the data model. Its builtin type coercion is your friend.
Don't put everything in a try-except block
Only catch exceptions you know you'll get, using exceptions everywhere for control flow is computationally expensive and can hide bugs. Try to use the most specific exception you expect you might get. This leads to more robust code over the long run.
Learn the built-in types and methods, in particular:
From the data-model
str
join
just do dir(str) and learn them all.
list
append (add an item on the end of the list)
extend (extend the list by adding each item in an iterable)
dict
get (provide a default that prevents you from having to catch keyerrors!)
setdefault (set from the default or the value already there!)
fromkeys (build a dict with default values from an iterable of keys!)
set
Sets contain unique (no repitition) hashable objects (like strings and numbers). Thinking Venn diagrams? Want to know if a set of strings is in a set of other strings, or what the overlaps are (or aren't?)
union
intersection
difference
symmetric_difference
issubset
isdisjoint
And just do dir() on every type you come across to see the methods and attributes in its namespace, and then do help() on the attribute to see what it does!
Learn the built-in functions and standard library:
I've caught developers writing their own max functions and set objects. It's a little embarrassing. Don't let that happen to you!
Important modules to be aware of in the Standard Library are:
os
sys
collections
itertools
pprint (I use it all the time)
logging
unittest
re (regular expressions are incredibly efficient at parsing strings for a lot of use-cases)
And peruse the docs for a brief tour of the standard library, here's Part 1 and here's Part II. And in general, make skimming all of the docs an early goal.
Read the Style Guides:
You will learn a lot about best practices just by reading your style guides! I recommend:
PEP 8 (anything included in the standard library is written to this standard)
Google's Python Style Guide
Your firm's, if you have one.
Additionally, you can learn great style by Googling for the issue you're looking into with the phrase "best practice" and then selecting the relevant Stackoverflow answers with the greatest number of upvotes!
I wish you luck on your journey to learning Python!

For example I'm writing a function that accepts a 2D Numpy array. Then in the body of the function I'm calling different methods of this array (which is an object of array in Numpy). But then in the future suppose I want to use this function, by that time I might forgot totally what should I pass to the function as a type. What do people normally do? Do they just write a documentation for this?
You write documentation and name the function and variables appropriately.
def func(two_d_array):
do stuff
Also suppose I want in the future to pass an object similar to an array, normally in Java one would implement an interface and then let both classes to implement the methods.
You could do this. Create a base class and inherit from it, so that multiple types have the same interface. However, quite often, this is overkill and you'd simply use duck typing instead. With duck typing, all that matters is that the object being evaluated defines the right properties and methods required to use it within your code.
Note that you can check for types in Python, but this is generally considered bad practice because it prevents you from using duck typing and other coding patterns enabled by Python's dynamic type system.

Yes, you should document what type(s) of arguments your methods expect, and it's up to the caller to pass the correct type of object. Within a method, you can write code to check the types of each argument, or you can just assume it's the correct type, and rely on Python to automatically throw an exception if the passed-in object doesn't support the methods that your code needs to call on it.
The disadvantage of dynamic typing is that the computer can't do as much up-front correctness checking, as you've noted; there's a greater burden on the programmer to make sure that all arguments are of the right type. But the advantage is that you have much more flexibility in what types can be passed to your methods:
You can write a method that supports several different types of objects for a particular argument, without needing overloads and duplicated code.
Sometimes a method doesn't really care about the exact type of an object as long as it supports a particular method or operation — say, indexing with square brackets, which works on strings, arrays, and a variety of other things. In Java you'd have to create an interface, and write wrapper classes to adapt various pre-existing types to that interface. In Python you don't need to do any of that.

You can use assert to check if conditions match:
In [218]: def foo(arg):
...: assert type(arg) is np.ndarray and np.rank(arg)==2, \
...: 'the argument must be a 2D numpy array'
...: print 'good arg'
In [219]: foo(np.arange(4).reshape((2,2)))
good arg
In [220]: foo(np.arange(4))
---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
<ipython-input-220-c0ee6e33c83d> in <module>()
----> 1 foo(np.arange(4))
<ipython-input-218-63565789690d> in foo(arg)
1 def foo(arg):
2 assert type(arg) is np.ndarray and np.rank(arg)==2, \
----> 3 'the argument must be a 2D numpy array'
4 print 'good arg'
AssertionError: the argument must be a 2D numpy array
It's always better to document what you've written completely as #ChinmayKanchi mentioned.

Here are a few pointers that might help you make your approach more 'Pythonic'.
The PEPs
In general, I recommend at least browsing through the PEPs. It helped me a lot to grok Python.
Pointers
Since you mentioned the word pointers, Python doesn't use pointers to objects in the sense that C uses pointers. I am not sure about the relationship to Java. Python uses names attached to objects. It's a subtle but important difference that can cause you problems if you expect similar-to-C pointer behavior.
Duck Typing
As you said, yes, if you are expecting a certain type of input you put it in the docstring.
As zhangxaochen wrote, you can use assert to do realtime typing of your arguments, but that's not really the python way if you are doing it all the time with no particular reason. As others mentioned, it's better to test and raise a TypeError if you have to do this. Python favors duck typing instead - if you send me something that quacks like a numpy 2D array, then that's fine.

Should methods in a class be classmethod by default?

I was just working on a large class hierarchy and thought that probably all methods in a class should be classmethods by default.
I mean that it is very rare that one needs to change the actual method for an object, and whatever variables one needs can be passed in explicitly. Also, this way there would be lesser number of methods where people could change the object itself (more typing to do it the other way), and people would be more inclined to be "functional" by default.
But, I am a newb and would like to find out the flaws in my idea (if there are any :).

Having classmethods as a default is a well-known but outdated paradigm. It's called Modular Programming. Your classes become effectively modules this way.
The Object-Oriented Paradigm (OOP) is mostly considered superior to the Modular Paradigm (and it is younger). The main difference is exactly that parts of code are associated by default to a group of data (called an object) — and thus not classmethods.
It turns out in practice that this is much more useful. Combined with other OOP architectural ideas like inheritance this offers directer ways to represent the models in the heads of the developers.
Using object methods I can write abstract code which can be used for objects of various types; I don't have to know the type of the objects while writing my routine. E. g. I can write a max() routine which compares the elements of a list with each other to find the greatest. Comparing then is done using the > operator which is effectively an object method of the element (in Python this is __gt__(), in C++ it would be operator>() etc.). Now the object itself (maybe a number, maybe a date, etc.) can handle the comparison of itself with another of its type. In code this can be written as short as
a > b # in Python this calls a.__gt__(b)
while with only having classmethods you would have to write it as
type(a).__gt__(a, b)
which is much less readable.

If the method doesn't access any of an object's state, but is specific to that object's class, then it's a good candidate for being a classmethod.
Otherwise if it's more general, then just use a function defined at module level, no need to make it belong to a specific class.
I've found that classmethods are actually pretty rare in practice, and certainly not the default. There should be plenty of good code out there (on e.g. github) to get examples from.

Declaring types for complex data structures in python

I am quite new to python programming (C/C++ background).
I'm writing code where I need to use complex data structures like dictionaries of dictionaries of lists.
The issue is that when I must use these objects I barely remember their structure and so how to access them.
This makes it difficult to resume working on code that was untouched for days.
A very poor solution is to use comments for each variable, but that's very inflexible.
So, given that python variables are just pointers to memory and they cannot be statically type-declared, is there any convention or rule that I could follow to ease complex data structures usage?

If you use docstrings in your classes then you can use help(vargoeshere) to see how to use it.

Whatever you do, do NOT, I repeat, do NOT use Hungarian Notation! It causes severe brain & bit rot.
So, what can you do? Python and C/C++ are quite different. In C++ you typically handle polymorphic calls like so:
void doWithFooThing(FooThing *foo) {
foo->bar();
}
Dynamic polymorphism in C++ depends on inheritance: the pointer passed to doWithFooThing may point only to instances of FooThing or one of its subclasses. Not so in Python:
def do_with_fooish(fooish):
fooish.bar()
Here, any sufficiently fooish thing (i.e. everything that has a callable bar attribute) can be used, no matter how it is releated to any other fooish thing through inheritance.
The point here is, in C++ you know what (base-)type every object has, whereas in Python you don't, and you don't care. What you try to achieve in Python is code that is reusable in as many situations as possible without having to force everthing under the rigid rule of class inheritance. Your naming should also reflect that. You dont write:
def some_action(a_list):
...
but:
def some_action(seq):
...
where seq might be not only a list, but any iterable sequence, be it list, tuple, dict, set, iterator, whatever.
In general, you put emphasis on the intent of your code, instead of its the type structure. Instead of writing:
dict_of_strings_to_dates = {}
you write:
users_birthdays = {}
It also helps to keep functions short, even more so than in C/C++. Then you'll be easily able to see what's going on.
Another thing: you shouldn't think of Python variables as pointers to memory. They're in fact dicionary entries:
assert foo.bar == getattr(foo, 'bar') == foo.__dict__['bar']
Not always exactly so, I concur, but the details can be looked up at docs.python.org.
And, BTW, in Python you don't declare stuff like you do in C/C++. You just define stuff.

I believe you should take a good look some of your complex structures, what you are doing with them, and ask... Is This Pythonic? Ask here on SO. I think you will find some cases where the complexity is an artifact of C/C++.

Include an example somewhere in your code, or in your tests.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.